From 3f1034031ae3b7395d3dcdaa777e6e1a6a1e0f46 Mon Sep 17 00:00:00 2001 From: Amir Yogev Date: Tue, 5 May 2026 14:47:02 +0300 Subject: [PATCH 1/6] Validation scripts for deterministic checks Offloads verifiable work to scripts instead of trusting the LLM. Several of the review dimensions have mechanically checkable aspects: - Orphaned references: grep SKILL.md for file paths, diff against actual files on disk - Step numbering gaps: parse skills/*.md for ## Step N headings, check for gaps/duplicates - Schema field consistency: extract field names from producer files, check they appear in consumer files - README vs implementation drift: extract feature claims from README, grep for them in skills/ A scripts/ directory with small shell/python checkers could run before the LLM review, feeding concrete findings into the evaluation rather than relying on the LLM to notice them. The LLM then focuses on judgment calls (clarity, cognitive load, edge cases) where it adds real value. --- AGENTS.md | 3 + CONTRIBUTING.md | 10 + skill-reviewer/README.md | 9 +- skill-reviewer/SKILL.md | 11 +- skill-reviewer/guidelines.md | 3 + skill-reviewer/scripts/pre-review-checks.py | 291 ++++++++++++++++++++ skill-reviewer/skills/review.md | 42 ++- 7 files changed, 357 insertions(+), 12 deletions(-) create mode 100755 skill-reviewer/scripts/pre-review-checks.py diff --git a/AGENTS.md b/AGENTS.md index a2a0c36..8057d0c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -36,6 +36,7 @@ workflow-name/ phase-name.md # Implementation for each phase commands/ phase-name.md # Thin wrappers that invoke controller or SKILL.md + scripts/ # Optional — deterministic operations invoked by skills ``` **Key architectural principles:** @@ -202,6 +203,7 @@ For detailed workflow development guidelines (structure, file conventions, testi - Single-phase workflow (no controller) - Read-only review (fixing findings is separate from review phase) - Must read all files in target skill before forming opinions +- Runs `scripts/pre-review-checks.py` for automated structural, frontmatter, reference, and sequencing checks before LLM evaluation ### docs-writer @@ -343,6 +345,7 @@ ai-workflows/ ├── kcs/ ├── prd/ ├── skill-reviewer/ +│ └── scripts/ ├── triage/ ├── install.sh # Installer with auto-discovery ├── uninstall.sh # Removal script diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d042f38..8111304 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -95,3 +95,13 @@ This ensures symlinks resolve paths correctly regardless of where the workflow i - Keep `SKILL.md` under 30 lines. Use progressive disclosure (`guidelines.md`, `README.md`) for details. - Use consistent terminology within a workflow. Pick one term and stick with it. - Don't duplicate content between `SKILL.md`, `guidelines.md`, and `controller.md` (when present). Each file has a distinct role. + +## Scripts + +Some workflows include a `scripts/` directory for scripts that offload deterministic work from the LLM — validation, data transformation, file discovery, or any operation better handled by code than by prompt. The `scripts/` directory is optional and follows these conventions: + +- Scripts are invoked by the workflow's skill files, not by users directly +- Scripts must work when the workflow is installed via symlink (`~/.ai-workflows/workflow-name/scripts/`) +- Use Python 3 or bash — whichever fits the task + +Currently, only `skill-reviewer/scripts/` uses this pattern. diff --git a/skill-reviewer/README.md b/skill-reviewer/README.md index 88ff002..4513575 100644 --- a/skill-reviewer/README.md +++ b/skill-reviewer/README.md @@ -17,6 +17,8 @@ This workflow provides a skeptical, structured review of any AI skill directory: skill-reviewer/ ├── commands/ │ └── review.md +├── scripts/ +│ └── pre-review-checks.py ├── skills/ │ └── review.md ├── guidelines.md @@ -26,7 +28,7 @@ skill-reviewer/ ### How Commands and Skills Work Together -The **command** (`commands/review.md`) is a thin wrapper that routes directly to the **review skill** (`skills/review.md`), which contains the full review process. No controller is needed — this is a single-phase workflow. +The **command** (`commands/review.md`) is a thin wrapper that routes directly to the **review skill** (`skills/review.md`), which contains the full review process. The **scripts** directory contains `pre-review-checks.py`, which runs automated structural checks before the LLM evaluation. No controller is needed — this is a single-phase workflow. ## Workflow Phase @@ -35,7 +37,8 @@ The **command** (`commands/review.md`) is a thin wrapper that routes directly to **Purpose**: Perform a deep, skeptical review of an AI skill directory. 1. Read all files in the target skill directory (`SKILL.md`, `skills/*.md`, `commands/*.md`, `guidelines.md`, `README.md`) -2. Evaluate against 8 review dimensions: +2. Run automated pre-review checks (`scripts/pre-review-checks.py`) — structural validation, frontmatter, orphaned/dangling references, step sequencing +3. Evaluate against 8 review dimensions (using automated check results as pre-validated evidence): - **Orchestration & Routing** — correct routing, no orphaned or dangling references - **Step Sequencing & Numbering** — sequential numbering, correct cross-references - **Schema Consistency** — matching field names/types across files @@ -44,7 +47,7 @@ The **command** (`commands/review.md`) is a thin wrapper that routes directly to - **Documentation & Project Alignment** — README matches implementation, consistent with sibling skills and project conventions - **Command Naming** — consistent, self-explanatory names - **Error Handling & Edge Cases** — failure modes documented, escalation clear -3. Classify findings by severity and produce a structured report +4. Classify findings by severity and produce a structured report **Output**: `.artifacts/skill-reviewer/{skill-name}/review.md` + findings presented inline. diff --git a/skill-reviewer/SKILL.md b/skill-reviewer/SKILL.md index 2a6e4d2..00516c4 100644 --- a/skill-reviewer/SKILL.md +++ b/skill-reviewer/SKILL.md @@ -13,7 +13,8 @@ description: >- Run `/review` to execute the full workflow. The user must specify which skill directory to review (e.g. `bugfix/`, `docs-writer/`). Executable without opening other files: 1. Read every file in the target skill directory: `SKILL.md`, `skills/*.md`, `commands/*.md`, `guidelines.md`, `README.md`. If the directory doesn't exist or has no skill files, report the error and stop. Note any missing files — gaps are themselves a finding. -2. Evaluate against 8 dimensions: +2. Run automated pre-review checks: `python3 scripts/pre-review-checks.py {target-dir}` — captures structural, frontmatter, reference, and sequencing issues deterministically. Treat `FAIL` results as pre-validated findings; apply judgment to `WARN` results. If the script is not present, skip and check manually. +3. Evaluate against 8 dimensions (use automated check results as pre-validated evidence where available): - **Orchestration & Routing** — correct routing, no orphaned/dangling references, executable Quick Start - **Step Sequencing** — sequential numbering, correct cross-references, logical order - **Schema Consistency** — matching field names/types across files, schema visible before first use @@ -22,9 +23,9 @@ Run `/review` to execute the full workflow. The user must specify which skill di - **Documentation & Project Alignment** — README matches implementation, consistent with sibling skills and project conventions - **Command Naming** — consistent pattern (verbs vs nouns), self-explanatory - **Error Handling** — failure modes documented, escalation paths clear -3. Classify each finding by severity — **CRITICAL** / **HIGH** (blockers) or **MEDIUM** / **LOW** (suggestions). -4. Validate findings: verify each finding cites a specific file, includes a concrete suggestion, and that blocker/suggestion counts are accurate. Drop any finding you cannot substantiate from the files you read. -5. Produce a structured report and write it to `.artifacts/skill-reviewer/{skill-name}/review.md`: +4. Classify each finding by severity — **CRITICAL** / **HIGH** (blockers) or **MEDIUM** / **LOW** (suggestions). +5. Validate findings: verify each finding cites a specific file, includes a concrete suggestion, and that blocker/suggestion counts are accurate. Drop any finding you cannot substantiate from the files you read. +6. Produce a structured report and write it to `.artifacts/skill-reviewer/{skill-name}/review.md`: ``` ## Skill Review: {skill-name} @@ -58,4 +59,6 @@ skill-reviewer/ review.md # /review command — loads guidelines + skill skills/ review.md # The review skill (detailed steps and output format) + scripts/ + pre-review-checks.py # Automated structural/reference/sequencing checks ``` diff --git a/skill-reviewer/guidelines.md b/skill-reviewer/guidelines.md index c68d7b7..ad9a1e9 100644 --- a/skill-reviewer/guidelines.md +++ b/skill-reviewer/guidelines.md @@ -14,6 +14,7 @@ Artifacts go in `.artifacts/skill-reviewer/{skill-name}/`. - Be constructive — every finding should include a concrete suggestion. - If something is broken, say so — don't minimize or hedge. - Distinguish between blockers (must fix) and suggestions (nice to have). +- Offload mechanical checks to `scripts/pre-review-checks.py` when available — focus reviewer attention on judgment calls that require reading comprehension and context. ## Hard Limits @@ -34,6 +35,8 @@ Artifacts go in `.artifacts/skill-reviewer/{skill-name}/`. - Findings must be classified by severity (CRITICAL / HIGH / MEDIUM / LOW) - The report must follow the structured output format defined in the review skill - Blocker and suggestion counts must be accurate +- Script `FAIL` results are pre-validated (verified against the filesystem) and can be cited directly as findings without re-verification +- Script `WARN` results require reviewer judgment — not all warnings are findings ## Escalation diff --git a/skill-reviewer/scripts/pre-review-checks.py b/skill-reviewer/scripts/pre-review-checks.py new file mode 100755 index 0000000..3b2c704 --- /dev/null +++ b/skill-reviewer/scripts/pre-review-checks.py @@ -0,0 +1,291 @@ +#!/usr/bin/env python3 +"""Automated pre-review checks for AI skill directories. + +Runs deterministic structural, frontmatter, reference, and step-sequencing +checks before the LLM review. Outputs structured findings with PASS/FAIL/WARN +status prefixes. + +Usage: pre-review-checks.py +Exit code: always 0 (findings are informational, not CI blockers) +""" + +import re +import sys +from pathlib import Path + +KNOWN_ENTRIES = {"SKILL.md", "guidelines.md", "README.md", "GUIDE.md", + "skills", "commands", "templates", "scripts"} +EXEMPT_FILES = {"SKILL.md", "guidelines.md", "README.md", "GUIDE.md"} +STEP_THRESHOLD = 10 +REF_PATTERN = re.compile(r'(?:\.\./)*(?:skills|commands|templates)/[a-zA-Z0-9_-]+\.md') +STEP_HEADING = re.compile(r'^#{2,3} Step (\d+)([a-zA-Z])?') + + +class Checker: + def __init__(self, skill_dir: Path): + self.skill_dir = skill_dir + self.skill_name = skill_dir.name + self.counts = {"PASS": 0, "WARN": 0, "FAIL": 0} + self.md_files = list(skill_dir.rglob("*.md")) + + def _emit(self, level: str, msg: str): + self.counts[level] += 1 + print(f"{level}: {msg}") + + def passed(self, msg: str): self._emit("PASS", msg) + def warn(self, msg: str): self._emit("WARN", msg) + def fail(self, msg: str): self._emit("FAIL", msg) + + def check_structure(self): + print("--- Structure ---") + d = self.skill_dir + + if (d / "SKILL.md").is_file(): + self.passed("SKILL.md exists") + else: + self.fail("SKILL.md is missing (required)") + + skills_dir = d / "skills" + if skills_dir.is_dir(): + count = len(list(skills_dir.glob("*.md"))) + if count > 0: + self.passed(f"skills/ directory with {count} skill file(s)") + else: + self.fail("skills/ directory exists but contains no .md files") + else: + self.fail("skills/ directory is missing (required)") + + for name in ("guidelines.md", "README.md"): + if (d / name).is_file(): + self.passed(f"{name} present") + else: + self.warn(f"No {name} found (optional but recommended)") + + cmds = d / "commands" + if cmds.is_dir(): + count = len(list(cmds.glob("*.md"))) + self.passed(f"commands/ directory with {count} command file(s)") + else: + self.warn("No commands/ directory found") + + tmpls = d / "templates" + if tmpls.is_dir(): + count = len(list(tmpls.glob("*.md"))) + self.passed(f"templates/ directory with {count} template file(s)") + + for entry in sorted(d.iterdir()): + if entry.name.startswith("."): + continue + if entry.name not in KNOWN_ENTRIES: + self.warn(f"Unexpected entry: {entry.name}") + + print() + + def _parse_frontmatter(self, path: Path) -> dict | None: + """Extract YAML frontmatter fields as a dict, or None if missing.""" + try: + text = path.read_text() + except OSError: + return None + lines = text.split("\n") + if not lines or lines[0].strip() != "---": + return None + fields = {} + for line in lines[1:]: + if line.strip() == "---": + break + if ":" in line: + key = line.split(":", 1)[0].strip() + val = line.split(":", 1)[1].strip() + fields[key] = val + return fields + + def check_frontmatter(self): + print("--- Frontmatter ---") + d = self.skill_dir + + skill_md = d / "SKILL.md" + if skill_md.is_file(): + fm = self._parse_frontmatter(skill_md) + if fm is None: + self.fail("SKILL.md missing YAML frontmatter (no opening ---)") + else: + ok = True + if "name" not in fm: + self.fail("SKILL.md frontmatter missing 'name' field") + ok = False + if "description" not in fm: + self.fail("SKILL.md frontmatter missing 'description' field") + ok = False + if ok: + self.passed("SKILL.md has valid frontmatter (name, description)") + + cmds = d / "commands" + if cmds.is_dir(): + for cmd in sorted(cmds.glob("*.md")): + rel = f"commands/{cmd.name}" + fm = self._parse_frontmatter(cmd) + if fm is None: + self.fail(f"{rel} missing YAML frontmatter") + continue + if "name" not in fm: + self.fail(f"{rel} frontmatter missing 'name' field") + continue + name_val = fm["name"].strip('"') + if not name_val.startswith(f"{self.skill_name}:"): + self.fail(f"{rel} name '{name_val}' does not use colon notation '{self.skill_name}:'") + else: + self.passed(f"{rel} has valid frontmatter") + + print() + + def _strip_code_blocks(self, text: str) -> str: + """Remove content inside fenced code blocks.""" + result = [] + in_fence = False + for line in text.split("\n"): + if line.startswith("```"): + in_fence = not in_fence + continue + if not in_fence: + result.append(line) + return "\n".join(result) + + def check_references(self): + print("--- References ---") + + file_contents = {} + for f in self.md_files: + try: + file_contents[f] = f.read_text() + except OSError: + continue + + # Orphaned files + orphan_found = False + for f in self.md_files: + if f.name in EXEMPT_FILES: + continue + referenced = any( + f.name in content + for other, content in file_contents.items() + if other != f + ) + if not referenced: + rel = f.relative_to(self.skill_dir) + self.fail(f"Orphaned file: {rel} (not referenced by any other file)") + orphan_found = True + + if not orphan_found: + self.passed("No orphaned files found") + + # Dangling references + dangle_found = False + seen_dangles = set() + + for f, content in file_contents.items(): + stripped = self._strip_code_blocks(content) + refs = set(REF_PATTERN.findall(stripped)) + for ref in sorted(refs): + ref_path = Path(ref) + # Try relative to the file's directory, then to the skill root + candidates = [f.parent / ref_path, self.skill_dir / ref_path] + if ref.startswith("../"): + candidates = [(f.parent / ref_path).resolve()] + + if not any(c.is_file() for c in candidates): + f_rel = f.relative_to(self.skill_dir) + key = (str(f_rel), ref) + if key not in seen_dangles: + seen_dangles.add(key) + self.fail(f"Dangling reference in {f_rel}: {ref} (file does not exist)") + dangle_found = True + + if not dangle_found: + self.passed("No dangling references found") + + print() + + def check_steps(self): + print("--- Steps ---") + + skills_dir = self.skill_dir / "skills" + if not skills_dir.is_dir(): + self.warn("No skills/ directory — skipping step checks") + print() + return + + for skill_file in sorted(skills_dir.glob("*.md")): + rel = f"skills/{skill_file.name}" + try: + text = skill_file.read_text() + except OSError: + continue + + steps = [] + substeps = [] + for line in text.split("\n"): + m = STEP_HEADING.match(line) + if not m: + continue + num = int(m.group(1)) + if m.group(2): + label = line.lstrip("#").strip().split(":")[0].strip() + substeps.append(label) + else: + steps.append(num) + + if substeps: + self.warn(f"{rel} uses sub-step numbering: {', '.join(substeps)}") + + if not steps: + continue + + # Duplicates + unique = sorted(set(steps)) + if len(steps) != len(unique): + dupes = sorted({n for n in steps if steps.count(n) > 1}) + self.fail(f"{rel} has duplicate step number(s): {', '.join(map(str, dupes))}") + + # Gaps + first, last = unique[0], unique[-1] + expected = set(range(first, last + 1)) + missing = sorted(expected - set(unique)) + if missing: + self.fail(f"{rel} has gap(s) in step numbering: missing Step {', '.join(map(str, missing))}") + else: + self.passed(f"{rel} steps sequential ({first}-{last})") + + # Threshold + if len(steps) > STEP_THRESHOLD: + self.warn(f"{rel} has {len(steps)} steps (threshold: {STEP_THRESHOLD})") + + print() + + def run(self): + print(f"=== Pre-Review Automated Checks: {self.skill_name} ===") + print() + self.check_structure() + self.check_frontmatter() + self.check_references() + self.check_steps() + total = sum(self.counts.values()) + print("--- Summary ---") + print(f"Checks: {total} | PASS: {self.counts['PASS']} | WARN: {self.counts['WARN']} | FAIL: {self.counts['FAIL']}") + + +def main(): + if len(sys.argv) < 2: + print(f"Usage: {sys.argv[0]} ") + return + + skill_dir = Path(sys.argv[1]).resolve() + if not skill_dir.is_dir(): + print(f"FAIL: Directory does not exist: {sys.argv[1]}") + return + + Checker(skill_dir).run() + + +if __name__ == "__main__": + main() diff --git a/skill-reviewer/skills/review.md b/skill-reviewer/skills/review.md index 94fd640..c7ff075 100644 --- a/skill-reviewer/skills/review.md +++ b/skill-reviewer/skills/review.md @@ -49,7 +49,39 @@ the order above). After each batch, note preliminary observations but defer judgment until all files are read. This prevents context overload while preserving the "no opinions before all files" principle. -### Step 3: Evaluate Review Dimensions +### Step 3: Run Automated Checks + +Run the pre-review validation script against the target skill directory: + +```bash +python3 {skill-reviewer-dir}/scripts/pre-review-checks.py {target-skill-directory} +``` + +Where `{skill-reviewer-dir}` is the directory this workflow was loaded from (e.g., +`~/.ai-workflows/skill-reviewer`), and `{target-skill-directory}` is the path you +used to read the skill files in Step 2. + +The script performs mechanical checks that can be verified deterministically: + +- **Structure**: required/optional/unexpected files against canonical layout +- **Frontmatter**: YAML validity, required fields, colon notation in commands +- **References**: orphaned files (exist but never referenced) and dangling references (referenced but don't exist) +- **Step sequencing**: sequential numbering, gaps, duplicates, sub-step notation, step count > 10 + +Review the script output: + +- **FAIL** results are pre-validated against the filesystem. Incorporate them + directly as findings in Step 5 — you do not need to re-verify these. +- **WARN** results need your judgment. Some warnings are genuine findings, others + are acceptable for the skill being reviewed. +- **PASS** results confirm that specific mechanical checks passed. Reference them + as evidence when evaluating the corresponding dimension (e.g., "Automated checks + confirmed no orphaned or dangling references"). + +If the script is not available (e.g., the skill-reviewer was installed without +the `scripts/` directory), skip this step and perform all checks manually in Step 4. + +### Step 4: Evaluate Review Dimensions Work through each dimension systematically. For each, note any findings. @@ -108,7 +140,7 @@ Work through each dimension systematically. For each, note any findings. - Are failure modes documented (e.g., "If zero results, stop and report")? - Are escalation paths clear? -### Step 4: Classify Findings +### Step 5: Classify Findings Assign a severity to each finding: @@ -117,7 +149,7 @@ Assign a severity to each finding: - **MEDIUM**: Quality issue. Documentation drift, naming inconsistency, missing edge case. - **LOW**: Polish. Readability, minor wording, suggestions for improvement. -### Step 5: Form a Verdict +### Step 6: Form a Verdict Count blockers (CRITICAL + HIGH) and suggestions (MEDIUM + LOW). Determine the overall verdict: @@ -126,7 +158,7 @@ overall verdict: - **Suggestions only** → Skill works but could be improved - **Clean** → Skill is well-structured and ready for use -### Step 6: Report to the User +### Step 7: Report to the User Persist the review report to `.artifacts/skill-reviewer/{skill-name}/review.md`, then present the same findings inline to the user. @@ -186,7 +218,7 @@ when there's an actual problem. If a skill is broken, say so. ## When This Phase Is Done -Your verdict and recommendations (from Step 6) serve as the phase summary. Tell +Your verdict and recommendations (from Step 7) serve as the phase summary. Tell the user where the review was written (`.artifacts/skill-reviewer/{skill-name}/review.md`). The review itself is complete. If the user subsequently asks to fix findings, From f9da33bc062dbc52e0112c817b37afd25df0198b Mon Sep 17 00:00:00 2001 From: Amir Yogev Date: Tue, 5 May 2026 20:57:01 +0300 Subject: [PATCH 2/6] Sub-agent delegation for large skills via prompts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Separates analyze-skill.md prompt into its own file and delegates deep-reading to an Explore agent. This skill currently reads everything in-context. For large skills (15+ files — you already flag this case), delegating the initial structured read to a sub-agent with a dedicated prompt template would: - Protect context window - Produce a structured intermediate artifact (like eval-analyze's YAML analysis block) - Let the reviewer LLM focus on evaluation rather than comprehension --- AGENTS.md | 4 +- CONTRIBUTING.md | 11 ++ skill-reviewer/README.md | 5 +- skill-reviewer/SKILL.md | 2 + skill-reviewer/prompts/analyze-skill.md | 137 ++++++++++++++++++++ skill-reviewer/scripts/pre-review-checks.py | 7 +- skill-reviewer/skills/review.md | 40 +++++- 7 files changed, 197 insertions(+), 9 deletions(-) create mode 100644 skill-reviewer/prompts/analyze-skill.md diff --git a/AGENTS.md b/AGENTS.md index 8057d0c..02947e8 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -37,6 +37,7 @@ workflow-name/ commands/ phase-name.md # Thin wrappers that invoke controller or SKILL.md scripts/ # Optional — deterministic operations invoked by skills + prompts/ # Optional — prompt templates for sub-agent delegation ``` **Key architectural principles:** @@ -92,7 +93,7 @@ Workflows write outputs to `.artifacts/{workflow-name}/{context}/`: - **bugfix**: `.artifacts/bugfix/{issue-number}/` (root-cause.md, reproduction.md, etc.) - **code-review**: `.artifacts/code-review/{branch}/` (00-reviewer-profile.md, 01-change-summary.md, code-review-{NNN}.md, review-response-{NNN}.md, review-metadata.json, decisions-{NNN}.json) - **triage**: `.artifacts/triage/{project}/` (issues.json, analyzed.json, report.html) -- **skill-reviewer**: `.artifacts/skill-reviewer/{skill-name}/` (review.md) +- **skill-reviewer**: `.artifacts/skill-reviewer/{skill-name}/` (skill-map.md, review.md) - **cve-fix**: `.artifacts/cve-fix/{context}/` (context.md, patch-log.md, validation-results.md, pr-description.md, backport-log.md, close-report.md) - **prd**: `.artifacts/prd/{issue-number}/` (01-requirements.md, 02-clarifications.md, 03-prd.md, 04-pr-description.md, 05-review-responses.md) - **design**: `.artifacts/design/{issue-number}/` (01-context.md, 02-research.md, 03-design.md, 04-epics.md, 05-stories/epic-{N}-{slug}.md, 05-stories/epic-{N}/story-{NN}-{slug}.md, 06-coverage.md, 07-pr-description.md, 08-review-responses.md, publish-metadata.json, sync-manifest.json) @@ -204,6 +205,7 @@ For detailed workflow development guidelines (structure, file conventions, testi - Read-only review (fixing findings is separate from review phase) - Must read all files in target skill before forming opinions - Runs `scripts/pre-review-checks.py` for automated structural, frontmatter, reference, and sequencing checks before LLM evaluation +- For large skills (15+ files), delegates initial reading to an Explore sub-agent using `prompts/analyze-skill.md`, which produces a skill map at `.artifacts/skill-reviewer/{skill-name}/skill-map.md` ### docs-writer diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 8111304..2160653 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -105,3 +105,14 @@ Some workflows include a `scripts/` directory for scripts that offload determini - Use Python 3 or bash — whichever fits the task Currently, only `skill-reviewer/scripts/` uses this pattern. + +## Prompts + +Some workflows include a `prompts/` directory for prompt templates given to sub-agents that perform delegated work — structured reading, analysis, or exploration that benefits from a fresh context window. The `prompts/` directory is optional and follows these conventions: + +- Prompt templates are self-contained — the sub-agent receives only the prompt, not the caller's context +- Templates use `{placeholder}` syntax for values the caller fills in before spawning the sub-agent +- Prompts must work when the workflow is installed via symlink (`~/.ai-workflows/workflow-name/prompts/`) +- Prompts instruct the sub-agent to write output to `.artifacts/`, not to return it in conversation + +Currently, only `skill-reviewer/prompts/` uses this pattern. diff --git a/skill-reviewer/README.md b/skill-reviewer/README.md index 4513575..3c7fada 100644 --- a/skill-reviewer/README.md +++ b/skill-reviewer/README.md @@ -17,6 +17,8 @@ This workflow provides a skeptical, structured review of any AI skill directory: skill-reviewer/ ├── commands/ │ └── review.md +├── prompts/ +│ └── analyze-skill.md ├── scripts/ │ └── pre-review-checks.py ├── skills/ @@ -28,7 +30,7 @@ skill-reviewer/ ### How Commands and Skills Work Together -The **command** (`commands/review.md`) is a thin wrapper that routes directly to the **review skill** (`skills/review.md`), which contains the full review process. The **scripts** directory contains `pre-review-checks.py`, which runs automated structural checks before the LLM evaluation. No controller is needed — this is a single-phase workflow. +The **command** (`commands/review.md`) is a thin wrapper that routes directly to the **review skill** (`skills/review.md`), which contains the full review process. The **prompts** directory contains `analyze-skill.md`, a prompt template given to an Explore sub-agent when reviewing large skills (15+ files) to produce a structured skill map before evaluation. The **scripts** directory contains `pre-review-checks.py`, which runs automated structural checks before the LLM evaluation. No controller is needed — this is a single-phase workflow. ## Workflow Phase @@ -98,6 +100,7 @@ Agent works through findings from highest severity to lowest. ```text .artifacts/skill-reviewer/{skill-name}/ +├── skill-map.md # Skill map (large skills only — produced by Explore sub-agent) └── review.md # Full review report with findings table ``` diff --git a/skill-reviewer/SKILL.md b/skill-reviewer/SKILL.md index 00516c4..41f75eb 100644 --- a/skill-reviewer/SKILL.md +++ b/skill-reviewer/SKILL.md @@ -57,6 +57,8 @@ skill-reviewer/ README.md # User-facing documentation commands/ review.md # /review command — loads guidelines + skill + prompts/ + analyze-skill.md # Prompt template for Explore sub-agent (large skill reading) skills/ review.md # The review skill (detailed steps and output format) scripts/ diff --git a/skill-reviewer/prompts/analyze-skill.md b/skill-reviewer/prompts/analyze-skill.md new file mode 100644 index 0000000..1bab3f7 --- /dev/null +++ b/skill-reviewer/prompts/analyze-skill.md @@ -0,0 +1,137 @@ +# Skill Analysis Prompt + +You are an Explore agent. Your job is to read every file in a skill directory +and produce a structured **skill map** that faithfully summarizes what you read. +You do **not** evaluate, judge, or recommend. You comprehend and report. + +## Target + +Skill directory: `{target-skill-directory}` +Skill name: `{skill-name}` + +## Reading Order + +Read each file **in full**. Do not skip any file. Read in this order: + +1. `SKILL.md` (orchestrator — overall structure and routing) +2. `guidelines.md` (principles and constraints) +3. `commands/*.md` (command routers — how users enter the workflow) +4. `skills/*.md` (phase/step skills, including `controller.md` — the detailed logic) +5. `README.md` (user-facing documentation) +6. Any other directories and files (`templates/`, `scripts/`, etc.) + +If a file is missing, note its absence in the File Inventory. + +## Output + +Write the skill map to `.artifacts/skill-reviewer/{skill-name}/skill-map.md` +using the structure below. After writing, return to the caller — do not take +any further action. + +--- + +## Skill Map Structure + +Use these exact section headings. If a section has no content (e.g., no schemas +found), write "None found." under the heading — do not omit the section. + +### File Inventory + +| Path | Type | Lines | Purpose | +|------|------|-------|---------| +| SKILL.md | orchestrator | 62 | Entry point with phase overview and routing | +| guidelines.md | guideline | 53 | Principles, hard limits, safety rules | +| commands/fix.md | command | 12 | Thin wrapper routing to controller for /fix | +| skills/controller.md | skill | 145 | Phase dispatcher and transition logic | +| ... | ... | ... | ... | + +**Type** values: `orchestrator`, `guideline`, `command`, `skill`, `template`, +`script`, `doc`, `other`. + +Mark missing expected files with type `missing`: + +| Path | Type | Lines | Purpose | +|------|------|-------|---------| +| guidelines.md | missing | — | Not found in directory | + +### Routing Graph + +List every file-to-file reference as an edge. Include the direction of reference +(which file mentions which). Use relative paths from the skill root. + +``` +SKILL.md → guidelines.md +SKILL.md → skills/controller.md +commands/fix.md → skills/controller.md +skills/controller.md → skills/fix.md +skills/controller.md → skills/test.md +skills/fix.md → (none — no outgoing references) +``` + +### Phase Pipeline + +Ordered list of phases as they appear in the workflow. For each phase: + +- **Phase name** and command (e.g., `/fix`) +- **Purpose**: one sentence +- **Skill file**: which file implements it +- **Key inputs**: what the phase reads or requires +- **Key outputs**: what the phase produces (files, artifacts, state changes) + +If the workflow has no explicit phases (single-phase workflow), describe the +single phase. + +### Schema Fields + +For each data schema, structured data format, or artifact template found: + +- **Where introduced**: file and step where the schema is first defined +- **Where consumed**: file(s) and step(s) where the schema fields are read +- **Fields**: field names and types/descriptions + +If no explicit schemas are found, write "None found." + +### Step Sequences + +For each file in `skills/` that uses step headings: + +- **File**: relative path +- **Steps**: numbered list of step headings as they appear (e.g., "Step 1: Identify the Target") +- **Cross-references**: any "see Step N" or "from Step N" references, noting which step/file they point to +- **Total step count**: number of steps in the file + +### Command Names + +| Command | Frontmatter name | Routes to | +|---------|-----------------|-----------| +| /fix | bugfix:fix | skills/controller.md | +| /test | bugfix:test | skills/controller.md | +| ... | ... | ... | + +### Key Constraints + +From `guidelines.md` (and any constraints stated in `SKILL.md`), list: + +- **Principles** (bulleted) +- **Hard limits** (bulleted) +- **Safety rules** (bulleted) +- **Quality standards** (bulleted) +- **Escalation criteria** (bulleted) + +If `guidelines.md` is missing, note this and extract any constraints visible +in `SKILL.md` or `README.md`. + +--- + +## Constraints for You + +- Do **not** evaluate the quality of the skill. Do not say "this is good" or + "this could be improved." +- Do **not** form opinions. Report what each file says, not what you think + about it. +- Do **not** omit files. If a file exists in the directory, it must appear in + the File Inventory. +- Do **not** invent information. If something is unclear or ambiguous, report + it as-is and add "(ambiguous)" or "(unclear)". +- Do **not** summarize file contents beyond what each section asks for. The + skill map is a structured index, not a paraphrase. diff --git a/skill-reviewer/scripts/pre-review-checks.py b/skill-reviewer/scripts/pre-review-checks.py index 3b2c704..0537166 100755 --- a/skill-reviewer/scripts/pre-review-checks.py +++ b/skill-reviewer/scripts/pre-review-checks.py @@ -14,8 +14,9 @@ from pathlib import Path KNOWN_ENTRIES = {"SKILL.md", "guidelines.md", "README.md", "GUIDE.md", - "skills", "commands", "templates", "scripts"} + "skills", "commands", "templates", "scripts", "prompts"} EXEMPT_FILES = {"SKILL.md", "guidelines.md", "README.md", "GUIDE.md"} +EXEMPT_DIRS = {"prompts"} # contain example content, not real references STEP_THRESHOLD = 10 REF_PATTERN = re.compile(r'(?:\.\./)*(?:skills|commands|templates)/[a-zA-Z0-9_-]+\.md') STEP_HEADING = re.compile(r'^#{2,3} Step (\d+)([a-zA-Z])?') @@ -179,11 +180,13 @@ def check_references(self): if not orphan_found: self.passed("No orphaned files found") - # Dangling references + # Dangling references (skip files in exempt dirs — they contain example content) dangle_found = False seen_dangles = set() for f, content in file_contents.items(): + if any(part in EXEMPT_DIRS for part in f.relative_to(self.skill_dir).parts[:-1]): + continue stripped = self._strip_code_blocks(content) refs = set(REF_PATTERN.findall(stripped)) for ref in sorted(refs): diff --git a/skill-reviewer/skills/review.md b/skill-reviewer/skills/review.md index c7ff075..f62986e 100644 --- a/skill-reviewer/skills/review.md +++ b/skill-reviewer/skills/review.md @@ -32,6 +32,11 @@ non-standard directories, read their contents — they may be relevant findings. ### Step 2: Read All Files +Count the `.md` files in the target skill directory (including subdirectories) +to determine which reading path to follow. + +#### Path A: Direct reading (fewer than 15 files) + Read **every file** in the skill directory before forming any opinion. Read in this order to build understanding progressively: @@ -44,10 +49,29 @@ this order to build understanding progressively: Read each file in full. If any expected file is missing, note it — gaps in the structure are themselves a finding. -**Large skill directories (15+ files):** Read in batches by file type (following -the order above). After each batch, note preliminary observations but defer -judgment until all files are read. This prevents context overload while -preserving the "no opinions before all files" principle. +#### Path B: Sub-agent delegation (15 or more files) + +Delegate reading to an Explore sub-agent to protect your context window for +evaluation. + +1. Read `../prompts/analyze-skill.md` — this is the prompt template for the + sub-agent. +2. Replace `{target-skill-directory}` with the full path to the target skill + directory, and `{skill-name}` with the skill name. +3. **If the AI runtime supports subagents:** Spawn an Explore sub-agent with + the filled-in prompt. The sub-agent will read all files and write a skill + map to `.artifacts/skill-reviewer/{skill-name}/skill-map.md`. +4. **If subagents are not available:** Follow the prompt yourself — read all + files in the canonical order and produce the skill map. This still protects + evaluation quality by creating a structured intermediate summary before + you proceed to Step 4. +5. After the skill map is written, read it back from + `.artifacts/skill-reviewer/{skill-name}/skill-map.md`. This is your + primary input for Step 4. + +**In either path**, you retain the ability to read individual files during +Step 4 for deeper inspection. The skill map is a comprehension aid, not a +replacement for targeted file reads when a finding requires verification. ### Step 3: Run Automated Checks @@ -83,6 +107,10 @@ the `scripts/` directory), skip this step and perform all checks manually in Ste ### Step 4: Evaluate Review Dimensions +If you produced or received a skill map in Step 2 (Path B), use it as your +primary reference for each dimension. Drill into specific files only when a +dimension requires verification that the skill map's summary cannot provide. + Work through each dimension systematically. For each, note any findings. #### Dimension 1: Orchestration & Routing @@ -161,7 +189,9 @@ overall verdict: ### Step 7: Report to the User Persist the review report to `.artifacts/skill-reviewer/{skill-name}/review.md`, -then present the same findings inline to the user. +then present the same findings inline to the user. If a skill map was produced +in Step 2, it remains at `.artifacts/skill-reviewer/{skill-name}/skill-map.md` +alongside the review. Use this structure: From 64ed6d0398cdf30b6c080dec17069c10f93ed93b Mon Sep 17 00:00:00 2001 From: Amir Yogev Date: Tue, 5 May 2026 21:06:25 +0300 Subject: [PATCH 3/6] Extended pre-review-checks.py with a check_changes() method - Computes SHA-256 hashes of all .md files in the target skill - Compares against stored hashes in .artifacts/skill-reviewer/{skill-name}/file-hashes.json - Reports changed, new, removed, and unchanged files - Writes current hashes for next comparison --- AGENTS.md | 2 +- skill-reviewer/README.md | 1 + skill-reviewer/scripts/pre-review-checks.py | 60 +++++++++++++++++++++ skill-reviewer/skills/review.md | 7 +++ 4 files changed, 69 insertions(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index 02947e8..6ab9c6c 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -93,7 +93,7 @@ Workflows write outputs to `.artifacts/{workflow-name}/{context}/`: - **bugfix**: `.artifacts/bugfix/{issue-number}/` (root-cause.md, reproduction.md, etc.) - **code-review**: `.artifacts/code-review/{branch}/` (00-reviewer-profile.md, 01-change-summary.md, code-review-{NNN}.md, review-response-{NNN}.md, review-metadata.json, decisions-{NNN}.json) - **triage**: `.artifacts/triage/{project}/` (issues.json, analyzed.json, report.html) -- **skill-reviewer**: `.artifacts/skill-reviewer/{skill-name}/` (skill-map.md, review.md) +- **skill-reviewer**: `.artifacts/skill-reviewer/{skill-name}/` (file-hashes.json, skill-map.md, review.md) - **cve-fix**: `.artifacts/cve-fix/{context}/` (context.md, patch-log.md, validation-results.md, pr-description.md, backport-log.md, close-report.md) - **prd**: `.artifacts/prd/{issue-number}/` (01-requirements.md, 02-clarifications.md, 03-prd.md, 04-pr-description.md, 05-review-responses.md) - **design**: `.artifacts/design/{issue-number}/` (01-context.md, 02-research.md, 03-design.md, 04-epics.md, 05-stories/epic-{N}-{slug}.md, 05-stories/epic-{N}/story-{NN}-{slug}.md, 06-coverage.md, 07-pr-description.md, 08-review-responses.md, publish-metadata.json, sync-manifest.json) diff --git a/skill-reviewer/README.md b/skill-reviewer/README.md index 3c7fada..b30baab 100644 --- a/skill-reviewer/README.md +++ b/skill-reviewer/README.md @@ -100,6 +100,7 @@ Agent works through findings from highest severity to lowest. ```text .artifacts/skill-reviewer/{skill-name}/ +├── file-hashes.json # SHA-256 hashes for change detection between reviews ├── skill-map.md # Skill map (large skills only — produced by Explore sub-agent) └── review.md # Full review report with findings table ``` diff --git a/skill-reviewer/scripts/pre-review-checks.py b/skill-reviewer/scripts/pre-review-checks.py index 0537166..f775aed 100755 --- a/skill-reviewer/scripts/pre-review-checks.py +++ b/skill-reviewer/scripts/pre-review-checks.py @@ -9,8 +9,11 @@ Exit code: always 0 (findings are informational, not CI blockers) """ +import hashlib +import json import re import sys +from datetime import datetime, timezone from pathlib import Path KNOWN_ENTRIES = {"SKILL.md", "guidelines.md", "README.md", "GUIDE.md", @@ -37,6 +40,62 @@ def passed(self, msg: str): self._emit("PASS", msg) def warn(self, msg: str): self._emit("WARN", msg) def fail(self, msg: str): self._emit("FAIL", msg) + def _compute_hashes(self) -> dict[str, str]: + """Compute SHA-256 hashes for all .md files in the skill directory.""" + hashes = {} + for f in sorted(self.md_files): + try: + digest = hashlib.sha256(f.read_bytes()).hexdigest()[:12] + rel = str(f.relative_to(self.skill_dir)) + hashes[rel] = digest + except OSError: + continue + return hashes + + def check_changes(self): + print("--- Changes Since Last Review ---") + + hashes_file = Path(f".artifacts/skill-reviewer/{self.skill_name}/file-hashes.json") + current_hashes = self._compute_hashes() + + if hashes_file.is_file(): + try: + prev = json.loads(hashes_file.read_text()) + prev_hashes = prev.get("files", {}) + prev_time = prev.get("timestamp", "unknown") + except (json.JSONDecodeError, OSError): + prev_hashes = {} + prev_time = "unknown" + + print(f"INFO: Previous review: {prev_time}") + + changed = [f for f in current_hashes if f in prev_hashes and current_hashes[f] != prev_hashes[f]] + added = [f for f in current_hashes if f not in prev_hashes] + removed = [f for f in prev_hashes if f not in current_hashes] + unchanged = len(current_hashes) - len(changed) - len(added) + + for f in sorted(changed): + self.warn(f"Changed: {f}") + for f in sorted(added): + print(f"INFO: New file: {f}") + for f in sorted(removed): + print(f"INFO: Removed: {f}") + if unchanged > 0: + print(f"INFO: Unchanged: {unchanged} file(s)") + if not changed and not added and not removed: + print("INFO: No changes detected since last review") + else: + print("INFO: No previous review found — full review") + + # Write current hashes for next comparison + hashes_file.parent.mkdir(parents=True, exist_ok=True) + hashes_file.write_text(json.dumps({ + "timestamp": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%S"), + "files": current_hashes, + }, indent=2) + "\n") + + print() + def check_structure(self): print("--- Structure ---") d = self.skill_dir @@ -268,6 +327,7 @@ def check_steps(self): def run(self): print(f"=== Pre-Review Automated Checks: {self.skill_name} ===") print() + self.check_changes() self.check_structure() self.check_frontmatter() self.check_references() diff --git a/skill-reviewer/skills/review.md b/skill-reviewer/skills/review.md index f62986e..75c84a2 100644 --- a/skill-reviewer/skills/review.md +++ b/skill-reviewer/skills/review.md @@ -91,6 +91,9 @@ The script performs mechanical checks that can be verified deterministically: - **Frontmatter**: YAML validity, required fields, colon notation in commands - **References**: orphaned files (exist but never referenced) and dangling references (referenced but don't exist) - **Step sequencing**: sequential numbering, gaps, duplicates, sub-step notation, step count > 10 +- **Change detection**: which files changed, were added, or removed since the last review (based on SHA-256 hashes stored in `.artifacts/skill-reviewer/{skill-name}/file-hashes.json`) + +The script writes current file hashes to `.artifacts/` for future comparisons. Review the script output: @@ -111,6 +114,10 @@ If you produced or received a skill map in Step 2 (Path B), use it as your primary reference for each dimension. Drill into specific files only when a dimension requires verification that the skill map's summary cannot provide. +If the automated checks in Step 3 reported changed files since a previous +review, prioritize those files and their cross-references. For unchanged files, +verify that previous findings still apply rather than re-evaluating from scratch. + Work through each dimension systematically. For each, note any findings. #### Dimension 1: Orchestration & Routing From f305d8aac5963b76eb6519032ed1aa2eb816bd25 Mon Sep 17 00:00:00 2001 From: Amir Yogev Date: Tue, 5 May 2026 21:12:56 +0300 Subject: [PATCH 4/6] Structured intermediate artifact before judgment MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Produces a structured YAML analysis (purpose, inputs, outputs, pipeline, quality criteria) before generating the final config. Current reviewer jumps from reading directly to evaluating. Adding an intermediate "skill map" step — a structured summary of what was read (routing graph, schema fields, step sequence, file inventory) — would: - Make the review auditable ("here's what I understood before judging") - Catch comprehension errors before they become wrong findings - Serve as the input to both automated checks and LLM evaluation --- skill-reviewer/README.md | 2 +- skill-reviewer/skills/review.md | 72 +++++++++++++++++---------------- 2 files changed, 38 insertions(+), 36 deletions(-) diff --git a/skill-reviewer/README.md b/skill-reviewer/README.md index b30baab..5f80573 100644 --- a/skill-reviewer/README.md +++ b/skill-reviewer/README.md @@ -101,7 +101,7 @@ Agent works through findings from highest severity to lowest. ```text .artifacts/skill-reviewer/{skill-name}/ ├── file-hashes.json # SHA-256 hashes for change detection between reviews -├── skill-map.md # Skill map (large skills only — produced by Explore sub-agent) +├── skill-map.md # Structured skill map (comprehension artifact before evaluation) └── review.md # Full review report with findings table ``` diff --git a/skill-reviewer/skills/review.md b/skill-reviewer/skills/review.md index 75c84a2..6d61b33 100644 --- a/skill-reviewer/skills/review.md +++ b/skill-reviewer/skills/review.md @@ -30,15 +30,16 @@ Optional directories (`commands/`, `templates/`, etc.) and files (`guidelines.md `README.md`) are reviewed when present but not required to proceed. If you find non-standard directories, read their contents — they may be relevant findings. -### Step 2: Read All Files +### Step 2: Read All Files and Produce Skill Map -Count the `.md` files in the target skill directory (including subdirectories) -to determine which reading path to follow. +Read every file in the skill directory and produce a structured **skill map** — +an intermediate artifact that captures your understanding before evaluation +begins. The skill map makes the review auditable and catches comprehension +errors before they become wrong findings. -#### Path A: Direct reading (fewer than 15 files) +#### How to read -Read **every file** in the skill directory before forming any opinion. Read in -this order to build understanding progressively: +Read files in this order to build understanding progressively: 1. `SKILL.md` (orchestrator — tells you the overall structure and routing) 2. `guidelines.md` (principles and constraints) @@ -49,29 +50,31 @@ this order to build understanding progressively: Read each file in full. If any expected file is missing, note it — gaps in the structure are themselves a finding. -#### Path B: Sub-agent delegation (15 or more files) - -Delegate reading to an Explore sub-agent to protect your context window for -evaluation. - -1. Read `../prompts/analyze-skill.md` — this is the prompt template for the - sub-agent. -2. Replace `{target-skill-directory}` with the full path to the target skill - directory, and `{skill-name}` with the skill name. -3. **If the AI runtime supports subagents:** Spawn an Explore sub-agent with - the filled-in prompt. The sub-agent will read all files and write a skill - map to `.artifacts/skill-reviewer/{skill-name}/skill-map.md`. -4. **If subagents are not available:** Follow the prompt yourself — read all - files in the canonical order and produce the skill map. This still protects - evaluation quality by creating a structured intermediate summary before - you proceed to Step 4. -5. After the skill map is written, read it back from - `.artifacts/skill-reviewer/{skill-name}/skill-map.md`. This is your - primary input for Step 4. - -**In either path**, you retain the ability to read individual files during -Step 4 for deeper inspection. The skill map is a comprehension aid, not a -replacement for targeted file reads when a finding requires verification. +**Large skill directories (15+ files):** Delegate reading to a sub-agent to +protect your context window. Read `../prompts/analyze-skill.md`, fill in +`{target-skill-directory}` and `{skill-name}`, then: +- **If the AI runtime supports subagents:** Spawn an Explore sub-agent with + the filled-in prompt. +- **If subagents are not available:** Follow the prompt yourself. + +#### How to produce the skill map + +Whether you read directly or via sub-agent, write a skill map to +`.artifacts/skill-reviewer/{skill-name}/skill-map.md` following the structure +defined in `../prompts/analyze-skill.md`. The skill map contains: + +- **File Inventory** — all files with type, line count, and purpose +- **Routing Graph** — which file references which +- **Phase Pipeline** — ordered phases with inputs and outputs +- **Schema Fields** — where introduced, where consumed +- **Step Sequences** — per-file step numbers, cross-references, counts +- **Command Names** — each command with frontmatter name and routing target +- **Key Constraints** — principles, hard limits, safety, quality, escalation + +After writing the skill map, read it back. This is your primary input for +Step 4. You retain the ability to read individual files during evaluation for +deeper inspection — the skill map is a comprehension aid, not a replacement +for targeted file reads when a finding requires verification. ### Step 3: Run Automated Checks @@ -110,9 +113,9 @@ the `scripts/` directory), skip this step and perform all checks manually in Ste ### Step 4: Evaluate Review Dimensions -If you produced or received a skill map in Step 2 (Path B), use it as your -primary reference for each dimension. Drill into specific files only when a -dimension requires verification that the skill map's summary cannot provide. +Use the skill map from Step 2 as your primary reference for each dimension. +Drill into specific files only when a dimension requires verification that +the skill map's summary cannot provide. If the automated checks in Step 3 reported changed files since a previous review, prioritize those files and their cross-references. For unchanged files, @@ -196,9 +199,8 @@ overall verdict: ### Step 7: Report to the User Persist the review report to `.artifacts/skill-reviewer/{skill-name}/review.md`, -then present the same findings inline to the user. If a skill map was produced -in Step 2, it remains at `.artifacts/skill-reviewer/{skill-name}/skill-map.md` -alongside the review. +then present the same findings inline to the user. The skill map remains at +`.artifacts/skill-reviewer/{skill-name}/skill-map.md` alongside the review. Use this structure: From b801a2a0d16858387e7d7565093b20797fe82725 Mon Sep 17 00:00:00 2001 From: Amir Yogev Date: Tue, 5 May 2026 21:31:42 +0300 Subject: [PATCH 5/6] Fix review comments --- AGENTS.md | 1 + CONTRIBUTING.md | 4 ++-- skill-reviewer/SKILL.md | 2 +- skill-reviewer/scripts/pre-review-checks.py | 25 +++++++++++++++------ 4 files changed, 22 insertions(+), 10 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 6ab9c6c..f9e1671 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -347,6 +347,7 @@ ai-workflows/ ├── kcs/ ├── prd/ ├── skill-reviewer/ +│ ├── prompts/ │ └── scripts/ ├── triage/ ├── install.sh # Installer with auto-discovery diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2160653..7440120 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -101,7 +101,7 @@ This ensures symlinks resolve paths correctly regardless of where the workflow i Some workflows include a `scripts/` directory for scripts that offload deterministic work from the LLM — validation, data transformation, file discovery, or any operation better handled by code than by prompt. The `scripts/` directory is optional and follows these conventions: - Scripts are invoked by the workflow's skill files, not by users directly -- Scripts must work when the workflow is installed via symlink (`~/.ai-workflows/workflow-name/scripts/`) +- Scripts must work when the workflow is installed via symlink (`scripts/` under the workflow root) - Use Python 3 or bash — whichever fits the task Currently, only `skill-reviewer/scripts/` uses this pattern. @@ -112,7 +112,7 @@ Some workflows include a `prompts/` directory for prompt templates given to sub- - Prompt templates are self-contained — the sub-agent receives only the prompt, not the caller's context - Templates use `{placeholder}` syntax for values the caller fills in before spawning the sub-agent -- Prompts must work when the workflow is installed via symlink (`~/.ai-workflows/workflow-name/prompts/`) +- Prompts must work when the workflow is installed via symlink (`prompts/` under the workflow root) - Prompts instruct the sub-agent to write output to `.artifacts/`, not to return it in conversation Currently, only `skill-reviewer/prompts/` uses this pattern. diff --git a/skill-reviewer/SKILL.md b/skill-reviewer/SKILL.md index 41f75eb..df4b6de 100644 --- a/skill-reviewer/SKILL.md +++ b/skill-reviewer/SKILL.md @@ -13,7 +13,7 @@ description: >- Run `/review` to execute the full workflow. The user must specify which skill directory to review (e.g. `bugfix/`, `docs-writer/`). Executable without opening other files: 1. Read every file in the target skill directory: `SKILL.md`, `skills/*.md`, `commands/*.md`, `guidelines.md`, `README.md`. If the directory doesn't exist or has no skill files, report the error and stop. Note any missing files — gaps are themselves a finding. -2. Run automated pre-review checks: `python3 scripts/pre-review-checks.py {target-dir}` — captures structural, frontmatter, reference, and sequencing issues deterministically. Treat `FAIL` results as pre-validated findings; apply judgment to `WARN` results. If the script is not present, skip and check manually. +2. Run automated pre-review checks: `python3 {skill-reviewer-dir}/scripts/pre-review-checks.py {target-dir}` — captures structural, frontmatter, reference, and sequencing issues deterministically. Treat `FAIL` results as pre-validated findings; apply judgment to `WARN` results. If the script is not present, skip and check manually. 3. Evaluate against 8 dimensions (use automated check results as pre-validated evidence where available): - **Orchestration & Routing** — correct routing, no orphaned/dangling references, executable Quick Start - **Step Sequencing** — sequential numbering, correct cross-references, logical order diff --git a/skill-reviewer/scripts/pre-review-checks.py b/skill-reviewer/scripts/pre-review-checks.py index f775aed..12920f7 100755 --- a/skill-reviewer/scripts/pre-review-checks.py +++ b/skill-reviewer/scripts/pre-review-checks.py @@ -21,7 +21,7 @@ EXEMPT_FILES = {"SKILL.md", "guidelines.md", "README.md", "GUIDE.md"} EXEMPT_DIRS = {"prompts"} # contain example content, not real references STEP_THRESHOLD = 10 -REF_PATTERN = re.compile(r'(?:\.\./)*(?:skills|commands|templates)/[a-zA-Z0-9_-]+\.md') +REF_PATTERN = re.compile(r'(?:\.\./)*(?:skills|commands|templates|prompts|scripts)/[a-zA-Z0-9_-]+\.md') STEP_HEADING = re.compile(r'^#{2,3} Step (\d+)([a-zA-Z])?') @@ -226,11 +226,17 @@ def check_references(self): for f in self.md_files: if f.name in EXEMPT_FILES: continue - referenced = any( - f.name in content - for other, content in file_contents.items() - if other != f - ) + target_rel = str(f.relative_to(self.skill_dir)).replace("\\", "/") + target_name = f.name + referenced = False + for other, content in file_contents.items(): + if other == f: + continue + stripped = self._strip_code_blocks(content) + refs = set(REF_PATTERN.findall(stripped)) + if target_rel in refs or any(r.endswith(f"/{target_name}") or r == target_name for r in refs): + referenced = True + break if not referenced: rel = f.relative_to(self.skill_dir) self.fail(f"Orphaned file: {rel} (not referenced by any other file)") @@ -253,7 +259,12 @@ def check_references(self): # Try relative to the file's directory, then to the skill root candidates = [f.parent / ref_path, self.skill_dir / ref_path] if ref.startswith("../"): - candidates = [(f.parent / ref_path).resolve()] + resolved = (f.parent / ref_path).resolve() + try: + resolved.relative_to(self.skill_dir.resolve()) + candidates = [resolved] + except ValueError: + candidates = [] if not any(c.is_file() for c in candidates): f_rel = f.relative_to(self.skill_dir) From e03624f3d44152de641b5bb929ed905e4b73bfd0 Mon Sep 17 00:00:00 2001 From: Amir Yogev Date: Wed, 6 May 2026 11:06:26 +0300 Subject: [PATCH 6/6] Fix review comments 2 --- CONTRIBUTING.md | 1 + skill-reviewer/scripts/pre-review-checks.py | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 7440120..8b20f1a 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -102,6 +102,7 @@ Some workflows include a `scripts/` directory for scripts that offload determini - Scripts are invoked by the workflow's skill files, not by users directly - Scripts must work when the workflow is installed via symlink (`scripts/` under the workflow root) +- Exit codes: `exit 0` = informational (findings reported but workflow continues), `exit 1` = halt (workflow should stop and surface the failure). Scripts that only report findings (like pre-review checks) should always exit 0. - Use Python 3 or bash — whichever fits the task Currently, only `skill-reviewer/scripts/` uses this pattern. diff --git a/skill-reviewer/scripts/pre-review-checks.py b/skill-reviewer/scripts/pre-review-checks.py index 12920f7..26fca12 100755 --- a/skill-reviewer/scripts/pre-review-checks.py +++ b/skill-reviewer/scripts/pre-review-checks.py @@ -6,7 +6,7 @@ status prefixes. Usage: pre-review-checks.py -Exit code: always 0 (findings are informational, not CI blockers) +Exit code: always 0 — informational only (see CONTRIBUTING.md § Scripts for exit-code convention) """ import hashlib