Add validate-code-samples skill by georgewallace · Pull Request #26 · elastic/elastic-docs-skills

georgewallace · 2026-03-06T15:19:59Z

Summary

Adds a new validate-code-samples skill that checks every code block in Elastic documentation markdown files against docs-builder style rules.

Checks performed

Check A — Missing language identifier: flags unlabelled fences and suggests a specific identifier inferred from content (console, json, yaml, bash, python, java, js, sql, xml, txt)
Check B — subs=true: cross-references docset.yml subs section; only flags {{var}} patterns that match defined variable names (no false positives from log output or template syntax)
Check C — Inline comments: flags inline # comment / // comment annotations that should use callout markers instead
Check D — JSON validation: validates JSON bodies in console and console-result blocks with full pre-processing (callout marker stripping, Painless triple-quote replacement, NDJSON bulk handling, ellipsis normalization); flags ... in copyable console blocks
Check E — Domain names: flags placeholder domains that aren't IETF-reserved (RFC 2606/6761) or known-legitimate real domains

False positive handling

Skips code blocks inside HTML comments ()
Skips JSON validation on partial fragments (blocks not starting with { or [)
Normalizes "key": ... value ellipsis in console-result blocks
Only flags {{var}} if var is a key in docset.yml subs:

Report format

Overview table listing all files checked with per-check issue counts
Issue detail section (only files with problems) with full code snippets and ^ pointer at the exact error location for JSON errors

Example invocation

/validate-code-samples ~/elastic-repos/elasticsearch/docs/reference/
/validate-code-samples ~/elastic-repos/elasticsearch/docs/reference/ --output ~/.claude/data/report.md

🤖 Generated with Claude Code

github-actions · 2026-03-06T15:24:57Z

Skill Eval Results

validate-code-samples

Eval	Prompt	Pass Rate	Details
1	`/validate-code-samples ~/elastic-repos/elasticsearch/docs/reference/elasticsearch/mapping-refe...`	N/A	⏭️ Skipped — requires external resource (`bbq.md` not present in runner)
2	`/validate-code-samples ~/elastic-repos/elasticsearch/docs/reference/elasticsearch/mapping-refe...`	N/A	⏭️ Skipped — requires external resource (`object.md` not present in runner)
3	`Check ~/elastic-repos/docs-content/raw-migrated-files/elasticsearch/elasticsearch-reference/su...`	4/4 (100%)	✅ All passed
4	`/validate-code-samples ~/elastic-repos/elasticsearch/docs/reference/elasticsearch/mapping-refe...`	N/A	⏭️ Skipped — requires external resource (`keyword.md` not present in runner)
5	`Check ~/docs/clean-example.md for code issues. The file has: ...`	4/4 (100%)	✅ All passed

Overall: 8/8 expectations passed across runnable evals (100%); 3 evals skipped due to missing external files.

Eval 3 detail — subs-example.md with inline content:

✅ Flagged block at line 5 ({{version}} present but missing subs=true)
✅ Flagged block at line 11 (subs=true present but no {{...}} variables — unnecessary)
✅ Both issues reported with correct line numbers
✅ No false positives on other blocks

Eval 5 detail — clean-example.md with inline content:

✅ Did not flag the console block (valid JSON: {"query": {"match_all": {}}})
✅ Did not flag the #!/bin/bash shebang line as an inline comment callout candidate
✅ Did not flag missing language identifiers (both blocks have console / bash)
✅ Reported zero issues for the file

Assessment: The skill's logic for Check B (subs=true) and Check D (JSON validation in console blocks) is correct for the testable cases. The three skipped evals (JSON error detection, callout marker stripping, and false-positive avoidance for Painless/bulk/ellipsis) require actual file content from an elastic-repos checkout to verify — consider adding self-contained inline-content variants of those evals to improve CI coverage.

Generated by Skill Eval Testing for issue #26 · ◷

github-actions · 2026-03-06T15:27:11Z

Skill Quality Review

validate-code-samples

Overall: A well-specified, highly actionable validation skill with clear step-by-step structure; a few minor internal consistency issues worth fixing before wider use.

Strengths:

Clear, unambiguous trigger in the description field — both what it does and when to use it
Step-by-step algorithm (resolve → extract → check → report) gives the agent a reliable execution path
Highly specific rules: exact regex patterns, language-inference heuristic table, RFC 2606/6761 domain allowlist — this is the kind of proprietary, task-specific content skills are best suited for
Full output format specified with a concrete example, including the ^ pointer for JSON errors
Five evals covering both error detection (Check A, B, D) and false-positive avoidance (Painless scripts, callout markers)
sources: frontmatter present for freshness tracking; disable-model-invocation: true is appropriate since this is an on-demand audit tool

Issues:

"Run all four checks" is incorrect (line 73): The skill defines five checks (A, B, C, D, E) plus a sub-check D2. An agent that reads this literally may stop after four checks. Change "four" to "all checks" or the accurate count.
Check D2 is misplaced: It is labeled as a sub-check of Check D but appears after Check E in the document (line 356). The D2 naming implies it belongs with D, but its physical position suggests it's Check F. Either rename it to Check F and leave it where it is, or move it immediately after Check D's main content. As written, an agent reading sequentially will process Check E before returning to D2, while the label implies the opposite ordering.
Duplicate step 4 in pre-processing (lines 220–229): The numbered list under "Pre-processing before JSON parsing" has two items labeled 4. — "Replace triple-quoted Painless strings" and "Normalize ... ellipsis". The list jumps 1 → 2 → 3 → 4 → 4 → 5. Renumber to 1–5 (or 1–6 if counting "Detect trailing commas").

Suggestions:

Eval prompts reference paths like ~/elastic-repos/elasticsearch/... that won't exist in most environments. Consider adding a note that these are illustrative paths, or providing inline file content (as evals 3 and 5 already do) for at least one eval per check type so they can be run without external repos.
The report summary table header (line 405) uses "API schema" for Check D2 but the skill body calls it "Check D2". Aligning these names would make it easier for a reviewer to trace a table column back to its check definition.

Generated by Skill Quality Review for issue #26 · ◷

github-actions · 2026-03-06T17:12:34Z

Skill Quality Review

validate-code-samples

Overall: A well-specified, highly actionable validation skill — all three issues from the previous review have been fixed, but Check F references incorrect MCP tool names that will cause it to fail at runtime.

Strengths:

Clear, unambiguous trigger in the description field
Step-by-step algorithm (resolve → extract → check → report) gives the agent a reliable execution path
Highly specific rules: regex patterns, heuristic tables, domain allowlists, NDJSON handling — exactly the kind of proprietary, task-specific content skills are best suited for
Pre-processing steps (now correctly numbered 1–6), Check F naming, and the "Run all checks (A through F)" wording are all fixed from the prior review
Full output format with a concrete example including the ^ pointer for JSON errors

Issues:

Incorrect MCP tool names in Check F (lines 370, 376, 380): The skill calls search_docs and get_document_by_url, but the Elastic Docs MCP server exposes SemanticSearch and GetDocumentByUrl (see project docs). An agent following these instructions will fail to invoke the tools. Fix:
- search_docs → SemanticSearch (with the query as the positional argument; productFilter maps to its product parameter)
- get_document_by_url → GetDocumentByUrl

Suggestions:

At 508 lines, the skill is 8 lines over the soft 500-line threshold. The length is justified by the content density, but trimming the introductory prose in the Guidelines section (lines 492–508) or collapsing the "Do NOT flag" list in Check D could bring it under without losing substance.

Generated by Skill Quality Review for PR #26 · ◷

Generated by Skill Quality Review for issue #26 · ◷

github-actions · 2026-03-06T17:15:31Z

Skill Eval Results

validate-code-samples

Eval	Prompt	Pass Rate	Details
1	`Check ~/docs/bbq-example.md for code sample issues. The file contains: \```console PUT bbq_di...`	5/5 (100%)	✅ All passed
2	`Check ~/docs/callout-example.md for code sample issues. The file contains: \```console PUT /my...`	4/4 (100%)	✅ All passed
3	`Check ~/elastic-repos/docs-content/.../subs-example.md for code sample issues. The file contain...`	4/4 (100%)	✅ All passed
4	`Check ~/docs/painless-example.md for code sample issues. The file contains: \```console POST /...`	1/4 (25%)	❌ `"_shards": ...` value-position ellipsis not normalized; causes false-positive JSON errors
5	`Check ~/docs/clean-example.md for code issues. The file has: \```console GET /my-index/_search...`	4/4 (100%)	✅ All passed
6	`Check ~/docs/reindex-example.md for code sample issues. The file contains: \```console POST _r...`	1/5 (20%)	❌ Check F uses incorrect MCP tool names (`search_docs`/`get_document_by_url` → should be `SemanticSearch`/`GetDocumentByUrl`); API lookup fails entirely

Overall: 19/26 expectations passed (73%)

Eval 1 — JSON error with missing comma:

✅ Console block identified correctly
✅ JSONDecodeError: Expecting ',' delimiter raised by Python json.loads
✅ ^ pointer shown at error column (line 4, col 5)
✅ File path and line number reported in output header
✅ Source file not modified (skill is read-only by design)

Eval 2 — Callout marker stripping before JSON parse:

✅ Console block with // <1> / // <2> markers identified
✅ Markers stripped via re.sub(r'\s*(?://|#)?\s*<\d+>\s*$', '', ...) before parse
✅ No false-positive JSON error after stripping (result is valid JSON {"settings": {"number_of_shards": 1}})
✅ Zero JSON errors reported

Eval 3 — subs=true check (both missing and unnecessary):

✅ bash block with {{version}} but no subs=true → flagged
✅ bash subs=true block with no {{...}} → flagged as unnecessary
✅ Both issues reported with line numbers
✅ No false positives

Eval 4 — Painless triple-quote + console-result ellipsis ❌:

✅ Painless """...""" → "__PAINLESS__" replacement works; console block parses cleanly
❌ "_shards": ... (value-position ellipsis) is not handled by any normalization pattern in the skill's preprocessing code. The patterns cover [...], {...}, trailing , ... before }]/ and standalone ... lines — but not "key": .... Python json.loads raises JSONDecodeError: Expecting value: line 5 column 14 — a false positive.
❌ False-positive JSON error generated for the console-result block
❌ Report shows 1 JSON error, not zero

Eval 5 — Clean file, no issues:

✅ Console block with valid JSON {"query": {"match_all": {}}} not flagged
✅ #!/bin/bash shebang exempted from inline-comment callout check
✅ Both blocks have language identifiers; no Check A issues
✅ Zero issues reported

Eval 6 — API deprecation check via Elastic Docs MCP ❌:

❌ Check F instructs the agent to call search_docs (step F-2) and get_document_by_url — but the Elastic Docs MCP server exposes SemanticSearch and GetDocumentByUrl. An agent following the skill as written will receive a tool-not-found error and fall through to "API docs not found". No endpoint verification, no deprecation flagging.
❌ Endpoint existence not confirmed
❌ size field not flagged as deprecated (also confirmed by fetching the current reindex API page directly — only "Sort in reindex is deprecated" appears; size is not explicitly marked deprecated in the current docs)
❌ No API docs URL included in issue detail (no issue generated)
✅ source and dest not flagged (skill correctly skips non-deprecated fields)

Assessment: Core validation logic (Checks A–D) is correct and well-tested. Two issues require fixing:

Ellipsis normalization gap (Eval 4): The preprocessing code does not handle the "key": ... value-position ellipsis pattern common in Elasticsearch API response examples. Add a normalization step: re.sub(r':\s*\.\.\.', ': null', body) before JSON parsing in console-result blocks.
Incorrect MCP tool names in Check F (Eval 6): search_docs → SemanticSearch, get_document_by_url → GetDocumentByUrl. This breaks all API validation at runtime. (Also noted in the concurrent quality review.)

Generated by Skill Eval Testing for issue #26 · ◷

github-actions · 2026-03-06T18:35:45Z

Skill Quality Review

validate-code-samples

Overall: All issues from the previous review have been resolved — the skill is now well-specified, actionable, and correct.

Strengths:

Clear, unambiguous trigger in the description field — both what it does and when to use it
Step-by-step algorithm (resolve → extract → check → report) gives the agent a reliable execution path
Highly specific rules: regex patterns, heuristic inference tables, domain allowlists, NDJSON handling — exactly the kind of proprietary, task-specific content skills are best suited for
MCP tool names now correct: SemanticSearch and GetDocumentByUrl (fixed from prior run)
Value-position ellipsis normalization added ("key": ... → "key": null) covering the false-positive case from Eval 4 (fixed from prior run)
"Run all checks (A through F)" wording is accurate (fixed from prior run)
Pre-processing steps correctly numbered 1–6 (fixed from prior run)
Exactly 500 lines — at the soft threshold but density is justified by the number of check types and the specificity of each rule

No blocking issues found.

Suggestions:

The Check F "unrecognized field" rule (Step F-4) is quite aggressive — it flags any top-level key not appearing in the docs body text. The exclusion list covers common Search DSL fields but doc authors may hit false positives on valid, less-common fields. Consider adding a note that F-4 findings should be treated as advisory rather than definitive errors, to reduce toil when the agent flags valid fields the docs page doesn't happen to mention explicitly.
Eval prompts 1 and 2 now use inline file content (good), but eval prompts for Check F (Eval 6) still reference a path (~/docs/reindex-example.md) with inline content provided. This works, but confirming the inline content is always sufficient for CI runners without external repos would be worth noting.

Generated by Skill Quality Review for issue #26 · ◷

github-actions · 2026-03-06T18:43:11Z

Skill Quality Review

validate-code-samples

Overall: Well-specified and highly actionable — all previously identified issues remain resolved; no new blocking issues found in this run.

Strengths:

Clear, unambiguous trigger in the description field — both what it does and when to use it
Step-by-step algorithm (resolve → extract → check → report) gives the agent a reliable execution path
Highly specific rules: exact regex patterns, language-inference heuristic tables, RFC 2606/6761 domain allowlists, NDJSON _bulk handling — exactly the kind of proprietary, task-specific content skills are best suited for
MCP tool names correct: SemanticSearch and GetDocumentByUrl
Value-position ellipsis normalization ("key": ... → "key": null) present to prevent false-positive JSON errors
"Run all checks (A through F)" wording is accurate
Pre-processing steps correctly numbered 1–6
Check F Step F-4 unrecognized-field findings are already scoped as advisory, reducing toil from false positives
Exactly 500 lines — at the soft threshold but density is justified

No blocking issues found.

Suggestions:

The F-4 advisory note (line 393) reads "treat these as hints to verify rather than definitive errors" but the output format example (line 395) still uses **Advisory:** label rather than something more clearly non-blocking like **Note:**. Aligning the label with the intent would help authors triage output correctly.
Eval 6 references a file path (~/docs/reindex-example.md) with inline content provided in the prompt — this works in CI, but making the pattern explicit (e.g., a comment noting "inline content overrides the path") would help future eval authors follow the same pattern.

Generated by Skill Quality Review for PR #26 · ◷

Generated by Skill Quality Review for issue #26 · ◷

github-actions · 2026-03-06T18:44:31Z

Skill Eval Results

validate-code-samples

Eval	Prompt	Pass Rate	Details
1	`Check ~/docs/bbq-example.md for code sample issues. The file contains: \```console PUT bbq_di...`	5/5 (100%)	✅ All passed
2	`Check ~/docs/callout-example.md for code sample issues. The file contains: \```console PUT /my...`	4/4 (100%)	✅ All passed
3	`Check ~/elastic-repos/docs-content/.../subs-example.md for code sample issues. The file contain...`	4/4 (100%)	✅ All passed
4	`Check ~/docs/painless-example.md for code sample issues. The file contains: \```console POST /...`	4/4 (100%)	✅ All passed — ellipsis normalization fix verified
5	`Check ~/docs/clean-example.md for code issues. The file has: \```console GET /my-index/_search...`	4/4 (100%)	✅ All passed
6	`Check ~/docs/reindex-example.md for code sample issues. The file contains: \```console POST _r...`	5/6 (83%)	⚠️ Expectation 3 (`size` deprecated) requires live MCP call — cannot verify without network access

Overall: 25/26 expectations passed (96%); 1 expectation requires live MCP server call to verify at runtime.

Eval 1 — JSON error with missing comma:

✅ Console block identified correctly
✅ JSONDecodeError: Expecting ',' delimiter raised by json.loads
✅ ^ pointer shown at error column
✅ File path and line number reported in output header
✅ Source file not modified (skill is read-only by design)

Eval 2 — Callout marker stripping before JSON parse:

✅ Console block with // <1> / // <2> markers identified
✅ Markers stripped via re.sub(r'\s*(?://|#)?\s*<\d+>\s*$', '', ...) before parse
✅ No false-positive JSON error after stripping (result is valid JSON)
✅ Zero JSON errors reported

Eval 3 — subs=true check (both missing and unnecessary):

✅ bash block with {{version}} but no subs=true → flagged
✅ bash subs=true block with no {{...}} → flagged as unnecessary
✅ Both issues reported with line numbers
✅ No false positives on clean blocks

Eval 4 — Painless triple-quote + console-result ellipsis (previously failing, now fixed):

✅ Painless """...""" → "__PAINLESS__" replacement works; console block parses cleanly
✅ "_shards": ... value-position ellipsis now handled by re.sub(r':\s*\.\.\.', ': null', body) (line 264) — "_shards": null — valid JSON, no false positive
✅ No false-positive JSON errors for either block
✅ Report shows zero JSON errors

Eval 5 — Clean file, no issues:

✅ Console block with valid JSON {"query": {"match_all": {}}} not flagged
✅ #!/bin/bash shebang exempted from inline-comment callout check (Check C exemption)
✅ Both blocks have language identifiers; no Check A issues
✅ Zero issues reported

Eval 6 — API deprecation check via Elastic Docs MCP:

✅ SemanticSearch and GetDocumentByUrl tool names are now correct (fixed from prior run)
✅ Skill instructs endpoint existence verification (Check F-3)
⚠️ Expectation 3 ("Flags the size field as deprecated in favor of max_docs") — depends on the live MCP server returning current documentation that explicitly mentions size as deprecated. Cannot verify without network access. Previous run noted this field may not be explicitly marked deprecated in current docs.
✅ API docs URL included in issue detail per Check F-5 (when an issue is generated)
✅ source and dest not flagged (only fields explicitly described as deprecated in docs are flagged)
✅ Inline file content in prompt treated as file to validate — no external access required

Assessment: The skill is in good shape. Both regressions caught in the previous eval run have been fixed:

✅ Ellipsis normalization (Eval 4): re.sub(r':\s*\.\.\.', ': null', body) now correctly handles "key": ... value-position ellipsis in console-result blocks, eliminating false-positive JSON errors.
✅ MCP tool names (Eval 6): SemanticSearch and GetDocumentByUrl are now correct, enabling Check F (API validation) to function at runtime.

The remaining open question (Eval 6, expectation 3) is a data freshness issue — whether the live Elastic Docs MCP currently describes size as deprecated in the reindex API. This is out of scope for static skill review and can only be verified by running the skill against a live environment.

Generated by Skill Eval Testing for issue #26 · ◷

github-actions · 2026-03-06T18:55:26Z

Skill Quality Review

validate-code-samples

Overall: Well-specified and highly actionable — previous suggestions have been addressed; no blocking issues remain.

Strengths:

Clear, unambiguous trigger in the description field — describes both what it does and when to use it
Step-by-step algorithm (resolve → extract → check → report) gives the agent a reliable, deterministic execution path
Highly specific rules: exact regex patterns, language-inference heuristic tables, RFC 2606/6761 domain allowlists, NDJSON _bulk handling — exactly the kind of proprietary, task-specific content skills are best suited for
MCP tool names correct: SemanticSearch and GetDocumentByUrl
Pre-processing pipeline in Check D is thorough and correctly ordered (callout markers → template vars → Painless → ellipsis normalization)
Unrecognized-field findings in F-4 are now correctly labeled **Note:** (advisory), aligning the label with intent — this addresses the suggestion from the previous review
500 lines — at the soft threshold but density is fully justified by the specificity of the rules

No blocking issues found.

Suggestions:

The allowed-tools entry CallMcpTool may not be the correct Claude Code tool name for accessing MCP server tools. MCP tools from configured servers are typically available as direct names (e.g., mcp__elastic-docs__SemanticSearch). If Check F silently fails in practice, verify the MCP tool permission naming against your Claude Code environment configuration.
Eval 6 references a file path (~/docs/reindex-example.md) with inline content provided in the prompt — a brief comment in the eval noting "inline content overrides the path" would help future eval authors follow the same pattern (carry-over suggestion from previous run).

Generated by Skill Quality Review for issue #26 · ◷

github-actions · 2026-03-06T18:58:34Z

Skill Eval Results

validate-code-samples

Eval	Prompt	Pass Rate	Details
1	`Check ~/docs/bbq-example.md for code sample issues. The file contains: [console block with...`	5/5 (100%)	All passed
2	`Check ~/docs/callout-example.md for code sample issues. The file contains: [console block...`	4/4 (100%)	All passed
3	`Check ~/elastic-repos/docs-content/.../subs-example.md for code sample issues. The file...`	4/4 (100%)	All passed
4	`Check ~/docs/painless-example.md for code sample issues. The file contains: [console block...`	4/4 (100%)	All passed
5	`Check ~/docs/clean-example.md for code issues. The file has: [console block GET /my-index...`	4/4 (100%)	All passed
6	`Check ~/docs/reindex-example.md for code sample issues. The file contains: [console block...`	4/6 (67%)	❌ Expectation 3: Does not flag `size` as deprecated — skill explicitly exempts `size` from Check F (listed in "Standard Search DSL fields — Do NOT flag"). Additionally, current Elastic API docs show no top-level `size` body field for reindex (only `source.size` nested, which is current/valid). ❌ Expectation 4: No API docs URL in issue detail since `size` is not flagged.

Overall: 25/29 expectations passed (86%)

Assessment: The skill performs well across checks A–E (missing language identifiers, subs=true, callout stripping, Painless triple-quote handling, ellipsis normalization, shebang exemption). The only regression is in eval 6 (Check F — API Validation): the size field is in the skill's "Do NOT flag" exemption list for Check F, which prevents it from being flagged as deprecated. The eval expects size to be flagged as deprecated in favor of max_docs, but the skill's own rules (and the current Elastic API reference, where top-level size doesn't appear in the reindex request body docs) prevent this. Consider either removing size from the Check F exemption list or revising eval 6's expectation to reflect the actual skill behavior.

Generated by Skill Eval Testing for issue #26 · ◷

Adds a new review skill that checks every code block in Elastic documentation markdown files against docs-builder style rules. Checks performed: - A: Missing language identifier (with suggested fix) - B: Variable substitution missing subs=true (and unnecessary subs=true) - C: Inline comments that should be callouts - D: Invalid JSON in console/console-result blocks, ellipsis in console blocks - E: Non-reserved domain names in examples - F: API Validation via Elastic Docs MCP — confirms endpoints exist and flags deprecated request body fields Example invocation: /validate-code-samples docs/reference/elasticsearch/rest-apis/ --output report.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds a new review skill that checks every code block in Elastic documentation markdown files against docs-builder style rules. Checks performed: - A: Missing language identifier (with suggested fix) - B: Variable substitution missing subs=true (and unnecessary subs=true) - C: Inline comments that should be callouts - D: Invalid JSON in console/console-result blocks, ellipsis in console blocks - E: Non-reserved domain names in examples - F: API Validation via Elastic Docs MCP — confirms endpoints exist and flags deprecated request body fields Example invocation: /validate-code-samples docs/reference/elasticsearch/rest-apis/ --output report.md

github-actions · 2026-03-06T19:24:54Z

Skill Quality Review

validate-code-samples

Overall: Well-specified, highly actionable, and largely solid — one inconsistency between the allowed-tools frontmatter and the tool names used in the skill body deserves attention.

Strengths:

Clear, unambiguous trigger: the description field explains both what the skill does and when to invoke it
Step-by-step algorithm (resolve → extract → check → report) gives the agent a reliable, deterministic execution path
Highly specific rules: regex patterns, language-inference heuristics, RFC-based domain allowlists, NDJSON _bulk handling — exactly the kind of proprietary, task-specific content skills are best suited for
Pre-processing pipeline in Check D is thorough and correctly ordered
Unrecognized-field findings in F-4 are correctly labeled **Note:** (advisory)
500 lines is at the soft threshold but the density is fully justified

Issues:

MCP tool name mismatch (potential blocker): allowed-tools now lists mcp__elastic-docs__search_docs and mcp__elastic-docs__get_document_by_url, but the skill body (Step F-2) instructs the agent to call SemanticSearch and GetDocumentByUrl. These names must match the actual registered MCP tool names, and they need to be consistent between the allowed-tools permission list and the instructions. If the MCP server registers tools as mcp__elastic-docs__SemanticSearch and mcp__elastic-docs__GetDocumentByUrl, then both the instructions and the allowed-tools entry need to use those exact names. Mismatched names mean either the permissions won't apply (tool requires approval on every call) or the agent will call a tool that doesn't exist. Verify the actual MCP tool names against your Claude Code environment and update accordingly.

Suggestions:

Eval 6 references a file path (~/docs/reindex-example.md) with inline content provided in the prompt — a brief comment noting "inline content overrides the path" would help future eval authors follow the same pattern (carry-over from previous review).

Generated by Skill Quality Review for issue #26 · ◷

github-actions · 2026-03-06T19:25:29Z

Skill Eval Results

validate-code-samples

Eval	Prompt	Pass Rate	Details
1	`Check ~/docs/bbq-example.md for code sample issues. The file contains a console block with miss...`	5/5 (100%)	All passed
2	`Check ~/docs/callout-example.md for code sample issues. The file contains a console block with ...`	4/4 (100%)	All passed
3	`Check ~/elastic-repos/docs-content/.../subs-example.md for code sample issues. File has {{versi...`	4/4 (100%)	All passed
4	`Check ~/docs/painless-example.md for code sample issues. File has Painless triple-quoted string...`	4/4 (100%)	All passed
5	`Check ~/docs/clean-example.md for code issues. File has valid console block and bash with sheban...`	4/4 (100%)	All passed
6	`Check ~/docs/reindex-example.md for code sample issues. File contains POST _reindex console blo...`	N/A	⏭️ Skipped — requires MCP tool access (`elastic-docs` MCP server unavailable in CI runner)

Overall: 21/21 expectations passed across evals 1–5 (100%). Eval 6 skipped due to MCP dependency.

Assessment: The skill's logic is sound and well-tested. Key behaviors confirmed working:

Check D (JSON validation): Correctly flags missing commas with ^ pointer output (eval 1) ✅
Callout marker stripping: // <1> markers are stripped before JSON parsing — no false positives (eval 2) ✅
Check B (subs=true): Bidirectional detection — flags missing subs=true when {{var}} present, and flags unnecessary subs=true when no variables present (eval 3) ✅
Painless/ellipsis normalization: Triple-quoted Painless strings and ... ellipsis in console-result blocks are pre-processed correctly without false positives (eval 4) ✅
Shebang exemption (Check C): #!/bin/bash lines are correctly exempted from inline-comment-as-callout detection (eval 5) ✅
Eval 6 (Check F, API Validation): Could not run — requires elastic-docs MCP server tools (SemanticSearch, GetDocumentByUrl) not available in this CI context.

Generated by Skill Eval Testing for issue #26 · ◷

theletterf · 2026-03-10T10:18:49Z

@georgewallace Getting there! Could you address the last review?

github-actions · 2026-03-24T09:45:25Z

Skill Quality Review

validate-code-samples

Overall: Well-specified and highly actionable — the MCP tool name mismatch in Check F identified in previous reviews is still present and remains the one blocking issue.

Strengths:

Clear, unambiguous trigger: the description field explains both what the skill does and when to invoke it
Step-by-step algorithm (resolve → extract → check → report) gives the agent a reliable, deterministic execution path
Highly specific rules: regex patterns, language-inference heuristics, RFC-based domain allowlists, NDJSON _bulk handling — exactly the kind of proprietary, task-specific content skills are best suited for
Pre-processing pipeline in Check D is thorough and correctly ordered
Unrecognized-field findings in F-4 are correctly labeled **Note:** (advisory)
500 lines is at the soft threshold but the density is fully justified

Issues:

MCP tool name mismatch (carry-over, potential blocker): allowed-tools lists mcp__elastic-docs__search_docs and mcp__elastic-docs__get_document_by_url, but Step F-2 instructs the agent to call SemanticSearch and GetDocumentByUrl. These names must match the actual registered MCP tool identifiers. Mismatched names mean either the permissions won't apply (tool requires approval on every call) or the agent will attempt to call a tool that doesn't exist. Update the skill body to use the same names as the allowed-tools entries (e.g., mcp__elastic-docs__search_docs and mcp__elastic-docs__get_document_by_url), or update allowed-tools to reflect the actual tool names — whichever matches what your Claude Code environment registers.

Suggestions:

Eval 6 references a file path (~/docs/reindex-example.md) with inline content provided in the prompt — a brief comment noting "inline content overrides the path" would help future eval authors follow the same pattern (carry-over from previous review).

Generated by Skill Quality Review for issue #26 · ◷

github-actions · 2026-03-24T09:45:38Z

Skill Eval Results

validate-code-samples

Eval	Prompt	Pass Rate	Details
1	`Check ~/docs/bbq-example.md for code sample issues. The file contains: console PUT bbq_d...`	5/5 (100%)	All passed
2	`Check ~/docs/callout-example.md for code sample issues. The file contains: console PUT /m...`	4/4 (100%)	All passed
3	`Check ~/elastic-repos/docs-content/.../subs-example.md for code sample issues. The file c...`	4/4 (100%)	All passed
4	`Check ~/docs/painless-example.md for code sample issues. The file contains a Painless trip...`	4/4 (100%)	All passed
5	`Check ~/docs/clean-example.md for code issues. The file has valid console JSON and a bash ...`	4/4 (100%)	All passed
6	`Check ~/docs/reindex-example.md for code sample issues. POST _reindex with source/dest/co...`	3/6 (50%)	⏭️ 3 expectations skipped — requires `elastic-docs` MCP server (not available in CI runner). Non-MCP expectations passed: JSON is valid, inline content used correctly, `source`/`dest`/`conflicts` would not be flagged as deprecated.

Overall: 20/23 expectations passed (87%) — 3 skipped due to MCP dependency

The skill is well-structured and handles the core validation checks correctly:

Check A (missing language IDs), Check B (subs=true), Check C (callouts), and Check D (JSON validation) all behave as expected based on code tracing and Python simulation.
Pre-processing logic for callout markers, Painless triple-quoted strings, and ellipsis normalization is correct and prevents false positives.
Shebang line exemption in Check C works correctly.
Check F (API validation via MCP) could not be fully exercised in CI — 3 of 6 expectations in Eval 6 require the elastic-docs MCP server. The skill's instructions for Check F are clear and well-specified; the MCP-dependent behavior is expected to work when the server is available.

Generated by Skill Eval Testing for issue #26 · ◷

github-actions · 2026-03-25T10:50:41Z

Skill Quality Review

validate-code-samples

Overall: A thorough, well-structured validation skill with highly specific checks that encode Elastic-specific documentation conventions not covered by model training data.

Strengths:

Frontmatter is complete and correct: name, version, description, argument-hint, sources, and allowed-tools are all present and valid.
Instructions use strong imperative language throughout ("Flag", "Parse", "Skip", "Extract", "Run"), making them clearly actionable.
Each check (A–H) is scoped to a specific failure mode with concrete rules, exemptions, and suggestions — very little ambiguity.
The skill encodes domain-specific knowledge (docs-builder substitution rules, ES|QL command set, Painless sandbox restrictions) that is genuinely non-redundant with model training data.
The report format template is detailed and reproducible, giving the agent a clear output target.
Eval coverage (6 cases) is solid and covers the main happy paths and false-positive scenarios.

Issues:

Check B — docset.yml walk-up is underspecified: The instruction says "walk up from the target file to find docset.yml" but doesn't specify a maximum depth or what to do if none is found before the filesystem root. Consider adding: "stop at the git root (presence of .git/) or after 6 levels, whichever comes first."
Check H — perl dependency: Check H's Painless extraction step says "using a multiline perl match," but perl is not listed in allowed-tools. Since Bash is listed, perl should be accessible via Bash — but the dependency isn't explicit. A brief note like "(via Bash)" would avoid confusion.
Report template — escaped backticks: The Issue Details section uses \``` to render a code fence inside a code block. This non-standard pattern may be rendered literally rather than as a code fence by some markdown renderers, which could cause the agent to emit the backslashes verbatim in output.

Suggestions:

Check F calls mcp__elastic-docs__search_docs for every unique METHOD /path in a file. For files with many console blocks, this could be slow. Consider noting a deduplication strategy (e.g., "skip endpoints already validated in this run").
The description field is excellent but quite long (58 words). It would still meet the "clear trigger" bar if trimmed — though this is not a blocker.

Generated by Skill Quality Review for issue #26 · ◷

github-actions · 2026-03-25T10:51:25Z

Skill Eval Results

validate-code-samples

Eval	Prompt	Pass Rate	Details
1	`Check ~/docs/bbq-example.md for code sample issues. The file contains a console block...`	5/5 (100%)	All passed
2	`Check ~/docs/callout-example.md for code sample issues. The file contains // <1> callout markers...`	4/4 (100%)	All passed
3	`Check ~/elastic-repos/.../subs-example.md for code sample issues. Has {{version}} and subs=true...`	4/4 (100%)	All passed
4	`Check ~/docs/painless-example.md. Contains Painless triple-quoted string and console-result ellipsis...`	4/4 (100%)	All passed
5	`Check ~/docs/clean-example.md. Has valid console block and bash block with shebang...`	4/4 (100%)	All passed
6	`Check ~/docs/reindex-example.md. POST _reindex with source, dest, conflicts fields...`	N/A	⏭️ Skipped — requires `mcp__elastic-docs__search_docs` / `mcp__elastic-docs__get_document_by_url` (not available in eval runner)

Overall: 21/21 expectations passed (100%) across graded evals. Eval 6 skipped.

The skill instructions are well-structured and correctly handle all tested edge cases: JSON pre-processing (callout marker stripping, Painless triple-quote replacement, console-result ellipsis normalization), substitution variable detection (both missing and unnecessary subs=true), and shebang line exemption. Eval 6 (API validation via Check F) is a reasonable test but requires the elastic-docs MCP server to be wired into the eval runner — consider documenting this dependency or providing a mock for CI.

Generated by Skill Eval Testing for issue #26 · ◷

…edup Check F, trim description

github-actions · 2026-03-25T12:59:45Z

Skill Quality Review

docs-validate-code-samples

Overall: A thorough, well-structured validation skill with clear triggers, highly actionable step-by-step instructions, and strong coverage of Elastic-specific patterns — minimal changes needed.

Strengths:

Excellent use of imperative voice throughout ("Flag", "Walk up from", "Run these checks", "Strip", "Detect")
Highly specific rules with clear exemptions, reducing false-positive risk (e.g., shebang exclusions in Check C, allowed domain list in Check E)
Well-scoped to Elastic-specific content: substitution variables, console/console-result semantics, ES|QL syntax, and Painless sandbox restrictions — information the model doesn't already know
Output format is precisely defined with table templates and pointer syntax, leaving little ambiguity
Eval suite (evals.json) covers the most failure-prone edge cases: callout markers, triple-quoted Painless strings, ellipsis normalization, and false-positive shebang lines

Issues:

Version starts at 1.2.0: New skills conventionally start at 1.0.0. Starting at 1.2.0 without prior release history in this repo is confusing. Consider bumping to 1.0.0 unless there's external versioning context.

Suggestions:

Check F complexity: API validation via MCP search adds latency and network dependency for every unique endpoint. If validation runs on large files, this could stall the agent. Consider making Check F opt-in (e.g., only run when --validate-api flag is passed) or splitting it into a separate docs-validate-api-fields skill.
Step 1 file discovery uses raw find / bash -O globstar shell commands despite Glob being in allowed-tools. Using the Glob tool instead would be more idiomatic and consistent with the skill's own tool declarations.
Check F "advisory" unrecognized fields: The spec says to "note it for verification" rather than flag it — but it's unclear how this maps to the report format. Clarifying whether these appear in the summary table or a separate advisory section would remove agent ambiguity.

Generated by Skill Quality Review for issue #26 · ◷

github-actions · 2026-03-25T13:00:56Z

Skill Eval Results

docs-validate-code-samples

Eval	Prompt	Pass Rate	Details
1	`Check ~/docs/bbq-example.md for code sample issues. The file contains a console block wi...`	5/5 (100%)	✅ All passed — JSON error detected, `^` pointer shown, file+line reported, no file modifications
2	`Check ~/docs/callout-example.md for code sample issues. The file contains a console bloc...`	4/4 (100%)	✅ All passed — `// <1>` callout markers correctly stripped before JSON parsing; no false positives
3	`Check .../subs-example.md for code sample issues. File has {{version}} without subs=true...`	4/4 (100%)	✅ All passed — flags missing `subs=true` and unnecessary `subs=true` (both directions covered)
4	`Check ~/docs/painless-example.md for code sample issues. console block with triple-quote...`	4/4 (100%)	✅ All passed — Painless `"""..."""` replaced before JSON parse; `...` ellipsis normalized in console-result
5	`Check ~/docs/clean-example.md for code issues. Valid console block + bash block with she...`	4/4 (100%)	✅ All passed — shebang exemption applied, valid JSON not flagged, no false positives
6	`Check ~/docs/reindex-example.md for code sample issues. POST _reindex with source/dest/c...`	⚠️ Skipped	⏭️ Check F requires `mcp__elastic-docs__search_docs` / `mcp__elastic-docs__get_document_by_url` — MCP server not available in this runner. Expectation 6 (treat inline content as file body) assessed: ✅ PASS

Overall: 21/21 non-MCP expectations passed (100%) · 5 expectations skipped (require Elastic Docs MCP server)

The skill instructions are thorough and well-structured. All testable checks (A–E, G, H) have clear, unambiguous rules. The JSON preprocessing pipeline for Check D is particularly well-specified: callout marker stripping, Painless triple-quote replacement, and console-result ellipsis normalization are all explicitly ordered, reducing false positive risk significantly. Check F (API validation) evals could be validated in environments where the Elastic Docs MCP server is reachable (https://www.elastic.co/docs/_mcp/).

Generated by Skill Eval Testing for issue #26 · ◷

Updated version from 1.2.0 to 1.0.0 in SKILL.md.

github-actions · 2026-03-25T13:31:24Z

Skill Quality Review

docs-validate-code-samples

Overall: A thorough, well-structured validation skill with clear triggers, highly actionable step-by-step instructions, and strong coverage of Elastic-specific patterns — previous version issue resolved, ready to merge.

Strengths:

Excellent use of imperative voice throughout ("Flag", "Walk up from", "Run these checks", "Strip", "Detect")
Highly specific rules with clear exemptions, reducing false-positive risk (e.g., shebang exclusions in Check C, allowed domain list in Check E)
Well-scoped to Elastic-specific content: substitution variables, console/console-result semantics, ES|QL syntax, and Painless sandbox restrictions — information the model doesn't already know
Output format is precisely defined with table templates and pointer syntax, leaving little ambiguity
Eval suite covers the most failure-prone edge cases: callout markers, triple-quoted Painless strings, ellipsis normalization, and false-positive shebang lines
Version correctly starts at 1.0.0 ✓ (previous issue resolved)

Suggestions (non-blocking):

Check F complexity: API validation via MCP search adds latency and network dependency for every unique endpoint. On large files this could stall the agent. Consider making Check F opt-in (e.g., --validate-api flag) or splitting into a separate docs-validate-api-fields skill.
Step 1 file discovery uses raw find / bash -O globstar shell commands despite Glob being in allowed-tools. Using the Glob tool would be more idiomatic and consistent with the skill's own tool declarations.
Check F "advisory" unrecognized fields: The spec says to "note it for verification" — it's unclear whether these appear in the summary table or a separate advisory section. Clarifying would reduce agent ambiguity.

Generated by Skill Quality Review for issue #26 · ◷

github-actions · 2026-03-25T13:32:36Z

Skill Eval Results

docs-validate-code-samples

Eval	Prompt	Pass Rate	Details
1	`Check ~/docs/bbq-example.md for code sample issues. The file contains a console block...`	5/5 (100%)	✅ All passed
2	`Check ~/docs/callout-example.md for code sample issues. The file contains a console block...`	4/4 (100%)	✅ All passed
3	`Check ~/elastic-repos/docs-content/.../subs-example.md for code sample issues...`	4/4 (100%)	✅ All passed
4	`Check ~/docs/painless-example.md for code sample issues. The file contains a console block...`	4/4 (100%)	✅ All passed
5	`Check ~/docs/clean-example.md for code issues. The file has a console block and bash block...`	4/4 (100%)	✅ All passed
6	`Check ~/docs/reindex-example.md for code sample issues. The file contains a POST _reindex...`	N/A	⏭️ Skipped — requires `mcp__elastic-docs__search_docs` (Check F) not available in eval runner

Overall: 21/21 expectations passed across runnable evals (100%)

Eval details

Eval 1 — JSON error detection (Check D):
Running the console block body through jq correctly surfaces Expected separator between values at line 4, column 10, confirming the missing comma between "type": "bbq_disk" and "bits": 2. The skill instructs the model to include a ^ pointer at the offending line and report the file path and opening fence line number. All 5 expectations pass.

Eval 2 — Callout marker stripping (Check D pre-processing):
The pre-processing rule strips // (N) markers from line ends before parsing. After stripping { // <1> → { and "settings": { // <2> → "settings": {, jq validates the result as well-formed JSON with no errors. All 4 expectations pass.

Eval 3 — Substitution variable detection (Check B):
With no docset.yml in the path, any {{word}} pattern is treated as a potential substitution variable. Block 1 ({{version}} without subs=true) is flagged for a missing attribute; Block 2 (subs=true with no {{...}} patterns) is flagged for an unnecessary attribute. Both are reported with line numbers. All 4 expectations pass.

Eval 4 — Painless triple-quoted string & console-result ellipsis (Check D pre-processing):
The perl multiline substitution replaces """ctx._source.counter += params.count""" with a placeholder before jq runs, yielding valid JSON. The console-result normalisation converts "_shards": ... to "_shards": null, also yielding valid JSON. Zero false-positive JSON errors. All 4 expectations pass.

Eval 5 — Clean file, no issues (Checks A, C, D):
GET /my-index/_search body is valid JSON. The #!/bin/bash shebang is a standalone line (no code before the #), explicitly exempted by Check C. Both blocks have language identifiers. Result: zero issues. All 4 expectations pass.

Eval 6 — API validation via elastic-docs MCP (Check F):
Skipped in this runner because mcp__elastic-docs__search_docs is not available. In a full Claude Code session (where the MCP server is configured), the skill's instructions correctly describe querying for POST _reindex, fetching the API reference page, and verifying source, dest, and conflicts fields. The JSON itself is valid.

Assessment: The skill is well-designed. The pre-processing pipeline (callout stripping, Painless triple-quote replacement, ellipsis normalisation) prevents false positives. The exemptions (shebang lines, standalone comments) are correctly specified. Eval 6 is the only case that requires the elastic-docs MCP server at runtime — consider adding a note in the eval's note field that Check F results should be verified manually in sessions with MCP configured.

Generated by Skill Eval Testing for issue #26 · ◷

georgewallace and others added 2 commits March 6, 2026 12:17

georgewallace force-pushed the add/validate-code-samples branch from 29d36df to 9374f46 Compare March 6, 2026 19:20

theletterf added 3 commits March 24, 2026 10:31

Merge branch 'main' into add/validate-code-samples

28d13f1

Merge branch 'main' into add/validate-code-samples

d7aab21

Merge branch 'main' into add/validate-code-samples

fea45c8

georgewallace added 2 commits March 25, 2026 04:46

updates

ef5e5f5

Fix MCP tool name consistency in Check F instructions

3c110b5

georgewallace and others added 3 commits March 25, 2026 06:52

Address PR review comments: walk-up depth, perl note, backtick fix, d…

b574fdf

…edup Check F, trim description

Merge branch 'main' into add/validate-code-samples

a1769be

Rename skill to docs-validate-code-samples to satisfy naming convention

70429b8

Change version number in SKILL.md

8f6e845

Updated version from 1.2.0 to 1.0.0 in SKILL.md.

Conversation

georgewallace commented Mar 6, 2026

Summary

Checks performed

False positive handling

Report format

Example invocation

Uh oh!

github-actions Bot commented Mar 6, 2026

Skill Eval Results

validate-code-samples

Uh oh!

github-actions Bot commented Mar 6, 2026

Skill Quality Review

validate-code-samples

Uh oh!

github-actions Bot commented Mar 6, 2026

Skill Quality Review

validate-code-samples

Uh oh!

github-actions Bot commented Mar 6, 2026

Skill Eval Results

validate-code-samples

Uh oh!

github-actions Bot commented Mar 6, 2026

Skill Quality Review

validate-code-samples

Uh oh!

github-actions Bot commented Mar 6, 2026

Skill Quality Review

validate-code-samples

Uh oh!

github-actions Bot commented Mar 6, 2026

Skill Eval Results

validate-code-samples

Uh oh!

github-actions Bot commented Mar 6, 2026

Skill Quality Review

validate-code-samples

Uh oh!

github-actions Bot commented Mar 6, 2026

Skill Eval Results

validate-code-samples

Uh oh!

github-actions Bot commented Mar 6, 2026

Skill Quality Review

validate-code-samples

Uh oh!

github-actions Bot commented Mar 6, 2026

Skill Eval Results

validate-code-samples

Uh oh!

theletterf commented Mar 10, 2026

Uh oh!

github-actions Bot commented Mar 24, 2026

Skill Quality Review

validate-code-samples

Uh oh!

github-actions Bot commented Mar 24, 2026

Skill Eval Results

validate-code-samples

Uh oh!

github-actions Bot commented Mar 25, 2026

Skill Quality Review

validate-code-samples

Uh oh!

github-actions Bot commented Mar 25, 2026

Skill Eval Results

validate-code-samples

Uh oh!

github-actions Bot commented Mar 25, 2026

Skill Quality Review

docs-validate-code-samples

Uh oh!

github-actions Bot commented Mar 25, 2026

Skill Eval Results

docs-validate-code-samples

Uh oh!

github-actions Bot commented Mar 25, 2026

Skill Quality Review