diff --git a/AGENTS.md b/AGENTS.md index a2a0c36..f9e1671 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -36,6 +36,8 @@ workflow-name/ phase-name.md # Implementation for each phase commands/ phase-name.md # Thin wrappers that invoke controller or SKILL.md + scripts/ # Optional — deterministic operations invoked by skills + prompts/ # Optional — prompt templates for sub-agent delegation ``` **Key architectural principles:** @@ -91,7 +93,7 @@ Workflows write outputs to `.artifacts/{workflow-name}/{context}/`: - **bugfix**: `.artifacts/bugfix/{issue-number}/` (root-cause.md, reproduction.md, etc.) - **code-review**: `.artifacts/code-review/{branch}/` (00-reviewer-profile.md, 01-change-summary.md, code-review-{NNN}.md, review-response-{NNN}.md, review-metadata.json, decisions-{NNN}.json) - **triage**: `.artifacts/triage/{project}/` (issues.json, analyzed.json, report.html) -- **skill-reviewer**: `.artifacts/skill-reviewer/{skill-name}/` (review.md) +- **skill-reviewer**: `.artifacts/skill-reviewer/{skill-name}/` (file-hashes.json, skill-map.md, review.md) - **cve-fix**: `.artifacts/cve-fix/{context}/` (context.md, patch-log.md, validation-results.md, pr-description.md, backport-log.md, close-report.md) - **prd**: `.artifacts/prd/{issue-number}/` (01-requirements.md, 02-clarifications.md, 03-prd.md, 04-pr-description.md, 05-review-responses.md) - **design**: `.artifacts/design/{issue-number}/` (01-context.md, 02-research.md, 03-design.md, 04-epics.md, 05-stories/epic-{N}-{slug}.md, 05-stories/epic-{N}/story-{NN}-{slug}.md, 06-coverage.md, 07-pr-description.md, 08-review-responses.md, publish-metadata.json, sync-manifest.json) @@ -202,6 +204,8 @@ For detailed workflow development guidelines (structure, file conventions, testi - Single-phase workflow (no controller) - Read-only review (fixing findings is separate from review phase) - Must read all files in target skill before forming opinions +- Runs `scripts/pre-review-checks.py` for automated structural, frontmatter, reference, and sequencing checks before LLM evaluation +- For large skills (15+ files), delegates initial reading to an Explore sub-agent using `prompts/analyze-skill.md`, which produces a skill map at `.artifacts/skill-reviewer/{skill-name}/skill-map.md` ### docs-writer @@ -343,6 +347,8 @@ ai-workflows/ ├── kcs/ ├── prd/ ├── skill-reviewer/ +│ ├── prompts/ +│ └── scripts/ ├── triage/ ├── install.sh # Installer with auto-discovery ├── uninstall.sh # Removal script diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d042f38..8b20f1a 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -95,3 +95,25 @@ This ensures symlinks resolve paths correctly regardless of where the workflow i - Keep `SKILL.md` under 30 lines. Use progressive disclosure (`guidelines.md`, `README.md`) for details. - Use consistent terminology within a workflow. Pick one term and stick with it. - Don't duplicate content between `SKILL.md`, `guidelines.md`, and `controller.md` (when present). Each file has a distinct role. + +## Scripts + +Some workflows include a `scripts/` directory for scripts that offload deterministic work from the LLM — validation, data transformation, file discovery, or any operation better handled by code than by prompt. The `scripts/` directory is optional and follows these conventions: + +- Scripts are invoked by the workflow's skill files, not by users directly +- Scripts must work when the workflow is installed via symlink (`scripts/` under the workflow root) +- Exit codes: `exit 0` = informational (findings reported but workflow continues), `exit 1` = halt (workflow should stop and surface the failure). Scripts that only report findings (like pre-review checks) should always exit 0. +- Use Python 3 or bash — whichever fits the task + +Currently, only `skill-reviewer/scripts/` uses this pattern. + +## Prompts + +Some workflows include a `prompts/` directory for prompt templates given to sub-agents that perform delegated work — structured reading, analysis, or exploration that benefits from a fresh context window. The `prompts/` directory is optional and follows these conventions: + +- Prompt templates are self-contained — the sub-agent receives only the prompt, not the caller's context +- Templates use `{placeholder}` syntax for values the caller fills in before spawning the sub-agent +- Prompts must work when the workflow is installed via symlink (`prompts/` under the workflow root) +- Prompts instruct the sub-agent to write output to `.artifacts/`, not to return it in conversation + +Currently, only `skill-reviewer/prompts/` uses this pattern. diff --git a/skill-reviewer/README.md b/skill-reviewer/README.md index 88ff002..5f80573 100644 --- a/skill-reviewer/README.md +++ b/skill-reviewer/README.md @@ -17,6 +17,10 @@ This workflow provides a skeptical, structured review of any AI skill directory: skill-reviewer/ ├── commands/ │ └── review.md +├── prompts/ +│ └── analyze-skill.md +├── scripts/ +│ └── pre-review-checks.py ├── skills/ │ └── review.md ├── guidelines.md @@ -26,7 +30,7 @@ skill-reviewer/ ### How Commands and Skills Work Together -The **command** (`commands/review.md`) is a thin wrapper that routes directly to the **review skill** (`skills/review.md`), which contains the full review process. No controller is needed — this is a single-phase workflow. +The **command** (`commands/review.md`) is a thin wrapper that routes directly to the **review skill** (`skills/review.md`), which contains the full review process. The **prompts** directory contains `analyze-skill.md`, a prompt template given to an Explore sub-agent when reviewing large skills (15+ files) to produce a structured skill map before evaluation. The **scripts** directory contains `pre-review-checks.py`, which runs automated structural checks before the LLM evaluation. No controller is needed — this is a single-phase workflow. ## Workflow Phase @@ -35,7 +39,8 @@ The **command** (`commands/review.md`) is a thin wrapper that routes directly to **Purpose**: Perform a deep, skeptical review of an AI skill directory. 1. Read all files in the target skill directory (`SKILL.md`, `skills/*.md`, `commands/*.md`, `guidelines.md`, `README.md`) -2. Evaluate against 8 review dimensions: +2. Run automated pre-review checks (`scripts/pre-review-checks.py`) — structural validation, frontmatter, orphaned/dangling references, step sequencing +3. Evaluate against 8 review dimensions (using automated check results as pre-validated evidence): - **Orchestration & Routing** — correct routing, no orphaned or dangling references - **Step Sequencing & Numbering** — sequential numbering, correct cross-references - **Schema Consistency** — matching field names/types across files @@ -44,7 +49,7 @@ The **command** (`commands/review.md`) is a thin wrapper that routes directly to - **Documentation & Project Alignment** — README matches implementation, consistent with sibling skills and project conventions - **Command Naming** — consistent, self-explanatory names - **Error Handling & Edge Cases** — failure modes documented, escalation clear -3. Classify findings by severity and produce a structured report +4. Classify findings by severity and produce a structured report **Output**: `.artifacts/skill-reviewer/{skill-name}/review.md` + findings presented inline. @@ -95,6 +100,8 @@ Agent works through findings from highest severity to lowest. ```text .artifacts/skill-reviewer/{skill-name}/ +├── file-hashes.json # SHA-256 hashes for change detection between reviews +├── skill-map.md # Structured skill map (comprehension artifact before evaluation) └── review.md # Full review report with findings table ``` diff --git a/skill-reviewer/SKILL.md b/skill-reviewer/SKILL.md index 2a6e4d2..df4b6de 100644 --- a/skill-reviewer/SKILL.md +++ b/skill-reviewer/SKILL.md @@ -13,7 +13,8 @@ description: >- Run `/review` to execute the full workflow. The user must specify which skill directory to review (e.g. `bugfix/`, `docs-writer/`). Executable without opening other files: 1. Read every file in the target skill directory: `SKILL.md`, `skills/*.md`, `commands/*.md`, `guidelines.md`, `README.md`. If the directory doesn't exist or has no skill files, report the error and stop. Note any missing files — gaps are themselves a finding. -2. Evaluate against 8 dimensions: +2. Run automated pre-review checks: `python3 {skill-reviewer-dir}/scripts/pre-review-checks.py {target-dir}` — captures structural, frontmatter, reference, and sequencing issues deterministically. Treat `FAIL` results as pre-validated findings; apply judgment to `WARN` results. If the script is not present, skip and check manually. +3. Evaluate against 8 dimensions (use automated check results as pre-validated evidence where available): - **Orchestration & Routing** — correct routing, no orphaned/dangling references, executable Quick Start - **Step Sequencing** — sequential numbering, correct cross-references, logical order - **Schema Consistency** — matching field names/types across files, schema visible before first use @@ -22,9 +23,9 @@ Run `/review` to execute the full workflow. The user must specify which skill di - **Documentation & Project Alignment** — README matches implementation, consistent with sibling skills and project conventions - **Command Naming** — consistent pattern (verbs vs nouns), self-explanatory - **Error Handling** — failure modes documented, escalation paths clear -3. Classify each finding by severity — **CRITICAL** / **HIGH** (blockers) or **MEDIUM** / **LOW** (suggestions). -4. Validate findings: verify each finding cites a specific file, includes a concrete suggestion, and that blocker/suggestion counts are accurate. Drop any finding you cannot substantiate from the files you read. -5. Produce a structured report and write it to `.artifacts/skill-reviewer/{skill-name}/review.md`: +4. Classify each finding by severity — **CRITICAL** / **HIGH** (blockers) or **MEDIUM** / **LOW** (suggestions). +5. Validate findings: verify each finding cites a specific file, includes a concrete suggestion, and that blocker/suggestion counts are accurate. Drop any finding you cannot substantiate from the files you read. +6. Produce a structured report and write it to `.artifacts/skill-reviewer/{skill-name}/review.md`: ``` ## Skill Review: {skill-name} @@ -56,6 +57,10 @@ skill-reviewer/ README.md # User-facing documentation commands/ review.md # /review command — loads guidelines + skill + prompts/ + analyze-skill.md # Prompt template for Explore sub-agent (large skill reading) skills/ review.md # The review skill (detailed steps and output format) + scripts/ + pre-review-checks.py # Automated structural/reference/sequencing checks ``` diff --git a/skill-reviewer/guidelines.md b/skill-reviewer/guidelines.md index c68d7b7..ad9a1e9 100644 --- a/skill-reviewer/guidelines.md +++ b/skill-reviewer/guidelines.md @@ -14,6 +14,7 @@ Artifacts go in `.artifacts/skill-reviewer/{skill-name}/`. - Be constructive — every finding should include a concrete suggestion. - If something is broken, say so — don't minimize or hedge. - Distinguish between blockers (must fix) and suggestions (nice to have). +- Offload mechanical checks to `scripts/pre-review-checks.py` when available — focus reviewer attention on judgment calls that require reading comprehension and context. ## Hard Limits @@ -34,6 +35,8 @@ Artifacts go in `.artifacts/skill-reviewer/{skill-name}/`. - Findings must be classified by severity (CRITICAL / HIGH / MEDIUM / LOW) - The report must follow the structured output format defined in the review skill - Blocker and suggestion counts must be accurate +- Script `FAIL` results are pre-validated (verified against the filesystem) and can be cited directly as findings without re-verification +- Script `WARN` results require reviewer judgment — not all warnings are findings ## Escalation diff --git a/skill-reviewer/prompts/analyze-skill.md b/skill-reviewer/prompts/analyze-skill.md new file mode 100644 index 0000000..1bab3f7 --- /dev/null +++ b/skill-reviewer/prompts/analyze-skill.md @@ -0,0 +1,137 @@ +# Skill Analysis Prompt + +You are an Explore agent. Your job is to read every file in a skill directory +and produce a structured **skill map** that faithfully summarizes what you read. +You do **not** evaluate, judge, or recommend. You comprehend and report. + +## Target + +Skill directory: `{target-skill-directory}` +Skill name: `{skill-name}` + +## Reading Order + +Read each file **in full**. Do not skip any file. Read in this order: + +1. `SKILL.md` (orchestrator — overall structure and routing) +2. `guidelines.md` (principles and constraints) +3. `commands/*.md` (command routers — how users enter the workflow) +4. `skills/*.md` (phase/step skills, including `controller.md` — the detailed logic) +5. `README.md` (user-facing documentation) +6. Any other directories and files (`templates/`, `scripts/`, etc.) + +If a file is missing, note its absence in the File Inventory. + +## Output + +Write the skill map to `.artifacts/skill-reviewer/{skill-name}/skill-map.md` +using the structure below. After writing, return to the caller — do not take +any further action. + +--- + +## Skill Map Structure + +Use these exact section headings. If a section has no content (e.g., no schemas +found), write "None found." under the heading — do not omit the section. + +### File Inventory + +| Path | Type | Lines | Purpose | +|------|------|-------|---------| +| SKILL.md | orchestrator | 62 | Entry point with phase overview and routing | +| guidelines.md | guideline | 53 | Principles, hard limits, safety rules | +| commands/fix.md | command | 12 | Thin wrapper routing to controller for /fix | +| skills/controller.md | skill | 145 | Phase dispatcher and transition logic | +| ... | ... | ... | ... | + +**Type** values: `orchestrator`, `guideline`, `command`, `skill`, `template`, +`script`, `doc`, `other`. + +Mark missing expected files with type `missing`: + +| Path | Type | Lines | Purpose | +|------|------|-------|---------| +| guidelines.md | missing | — | Not found in directory | + +### Routing Graph + +List every file-to-file reference as an edge. Include the direction of reference +(which file mentions which). Use relative paths from the skill root. + +``` +SKILL.md → guidelines.md +SKILL.md → skills/controller.md +commands/fix.md → skills/controller.md +skills/controller.md → skills/fix.md +skills/controller.md → skills/test.md +skills/fix.md → (none — no outgoing references) +``` + +### Phase Pipeline + +Ordered list of phases as they appear in the workflow. For each phase: + +- **Phase name** and command (e.g., `/fix`) +- **Purpose**: one sentence +- **Skill file**: which file implements it +- **Key inputs**: what the phase reads or requires +- **Key outputs**: what the phase produces (files, artifacts, state changes) + +If the workflow has no explicit phases (single-phase workflow), describe the +single phase. + +### Schema Fields + +For each data schema, structured data format, or artifact template found: + +- **Where introduced**: file and step where the schema is first defined +- **Where consumed**: file(s) and step(s) where the schema fields are read +- **Fields**: field names and types/descriptions + +If no explicit schemas are found, write "None found." + +### Step Sequences + +For each file in `skills/` that uses step headings: + +- **File**: relative path +- **Steps**: numbered list of step headings as they appear (e.g., "Step 1: Identify the Target") +- **Cross-references**: any "see Step N" or "from Step N" references, noting which step/file they point to +- **Total step count**: number of steps in the file + +### Command Names + +| Command | Frontmatter name | Routes to | +|---------|-----------------|-----------| +| /fix | bugfix:fix | skills/controller.md | +| /test | bugfix:test | skills/controller.md | +| ... | ... | ... | + +### Key Constraints + +From `guidelines.md` (and any constraints stated in `SKILL.md`), list: + +- **Principles** (bulleted) +- **Hard limits** (bulleted) +- **Safety rules** (bulleted) +- **Quality standards** (bulleted) +- **Escalation criteria** (bulleted) + +If `guidelines.md` is missing, note this and extract any constraints visible +in `SKILL.md` or `README.md`. + +--- + +## Constraints for You + +- Do **not** evaluate the quality of the skill. Do not say "this is good" or + "this could be improved." +- Do **not** form opinions. Report what each file says, not what you think + about it. +- Do **not** omit files. If a file exists in the directory, it must appear in + the File Inventory. +- Do **not** invent information. If something is unclear or ambiguous, report + it as-is and add "(ambiguous)" or "(unclear)". +- Do **not** summarize file contents beyond what each section asks for. The + skill map is a structured index, not a paraphrase. diff --git a/skill-reviewer/scripts/pre-review-checks.py b/skill-reviewer/scripts/pre-review-checks.py new file mode 100755 index 0000000..26fca12 --- /dev/null +++ b/skill-reviewer/scripts/pre-review-checks.py @@ -0,0 +1,365 @@ +#!/usr/bin/env python3 +"""Automated pre-review checks for AI skill directories. + +Runs deterministic structural, frontmatter, reference, and step-sequencing +checks before the LLM review. Outputs structured findings with PASS/FAIL/WARN +status prefixes. + +Usage: pre-review-checks.py +Exit code: always 0 — informational only (see CONTRIBUTING.md § Scripts for exit-code convention) +""" + +import hashlib +import json +import re +import sys +from datetime import datetime, timezone +from pathlib import Path + +KNOWN_ENTRIES = {"SKILL.md", "guidelines.md", "README.md", "GUIDE.md", + "skills", "commands", "templates", "scripts", "prompts"} +EXEMPT_FILES = {"SKILL.md", "guidelines.md", "README.md", "GUIDE.md"} +EXEMPT_DIRS = {"prompts"} # contain example content, not real references +STEP_THRESHOLD = 10 +REF_PATTERN = re.compile(r'(?:\.\./)*(?:skills|commands|templates|prompts|scripts)/[a-zA-Z0-9_-]+\.md') +STEP_HEADING = re.compile(r'^#{2,3} Step (\d+)([a-zA-Z])?') + + +class Checker: + def __init__(self, skill_dir: Path): + self.skill_dir = skill_dir + self.skill_name = skill_dir.name + self.counts = {"PASS": 0, "WARN": 0, "FAIL": 0} + self.md_files = list(skill_dir.rglob("*.md")) + + def _emit(self, level: str, msg: str): + self.counts[level] += 1 + print(f"{level}: {msg}") + + def passed(self, msg: str): self._emit("PASS", msg) + def warn(self, msg: str): self._emit("WARN", msg) + def fail(self, msg: str): self._emit("FAIL", msg) + + def _compute_hashes(self) -> dict[str, str]: + """Compute SHA-256 hashes for all .md files in the skill directory.""" + hashes = {} + for f in sorted(self.md_files): + try: + digest = hashlib.sha256(f.read_bytes()).hexdigest()[:12] + rel = str(f.relative_to(self.skill_dir)) + hashes[rel] = digest + except OSError: + continue + return hashes + + def check_changes(self): + print("--- Changes Since Last Review ---") + + hashes_file = Path(f".artifacts/skill-reviewer/{self.skill_name}/file-hashes.json") + current_hashes = self._compute_hashes() + + if hashes_file.is_file(): + try: + prev = json.loads(hashes_file.read_text()) + prev_hashes = prev.get("files", {}) + prev_time = prev.get("timestamp", "unknown") + except (json.JSONDecodeError, OSError): + prev_hashes = {} + prev_time = "unknown" + + print(f"INFO: Previous review: {prev_time}") + + changed = [f for f in current_hashes if f in prev_hashes and current_hashes[f] != prev_hashes[f]] + added = [f for f in current_hashes if f not in prev_hashes] + removed = [f for f in prev_hashes if f not in current_hashes] + unchanged = len(current_hashes) - len(changed) - len(added) + + for f in sorted(changed): + self.warn(f"Changed: {f}") + for f in sorted(added): + print(f"INFO: New file: {f}") + for f in sorted(removed): + print(f"INFO: Removed: {f}") + if unchanged > 0: + print(f"INFO: Unchanged: {unchanged} file(s)") + if not changed and not added and not removed: + print("INFO: No changes detected since last review") + else: + print("INFO: No previous review found — full review") + + # Write current hashes for next comparison + hashes_file.parent.mkdir(parents=True, exist_ok=True) + hashes_file.write_text(json.dumps({ + "timestamp": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%S"), + "files": current_hashes, + }, indent=2) + "\n") + + print() + + def check_structure(self): + print("--- Structure ---") + d = self.skill_dir + + if (d / "SKILL.md").is_file(): + self.passed("SKILL.md exists") + else: + self.fail("SKILL.md is missing (required)") + + skills_dir = d / "skills" + if skills_dir.is_dir(): + count = len(list(skills_dir.glob("*.md"))) + if count > 0: + self.passed(f"skills/ directory with {count} skill file(s)") + else: + self.fail("skills/ directory exists but contains no .md files") + else: + self.fail("skills/ directory is missing (required)") + + for name in ("guidelines.md", "README.md"): + if (d / name).is_file(): + self.passed(f"{name} present") + else: + self.warn(f"No {name} found (optional but recommended)") + + cmds = d / "commands" + if cmds.is_dir(): + count = len(list(cmds.glob("*.md"))) + self.passed(f"commands/ directory with {count} command file(s)") + else: + self.warn("No commands/ directory found") + + tmpls = d / "templates" + if tmpls.is_dir(): + count = len(list(tmpls.glob("*.md"))) + self.passed(f"templates/ directory with {count} template file(s)") + + for entry in sorted(d.iterdir()): + if entry.name.startswith("."): + continue + if entry.name not in KNOWN_ENTRIES: + self.warn(f"Unexpected entry: {entry.name}") + + print() + + def _parse_frontmatter(self, path: Path) -> dict | None: + """Extract YAML frontmatter fields as a dict, or None if missing.""" + try: + text = path.read_text() + except OSError: + return None + lines = text.split("\n") + if not lines or lines[0].strip() != "---": + return None + fields = {} + for line in lines[1:]: + if line.strip() == "---": + break + if ":" in line: + key = line.split(":", 1)[0].strip() + val = line.split(":", 1)[1].strip() + fields[key] = val + return fields + + def check_frontmatter(self): + print("--- Frontmatter ---") + d = self.skill_dir + + skill_md = d / "SKILL.md" + if skill_md.is_file(): + fm = self._parse_frontmatter(skill_md) + if fm is None: + self.fail("SKILL.md missing YAML frontmatter (no opening ---)") + else: + ok = True + if "name" not in fm: + self.fail("SKILL.md frontmatter missing 'name' field") + ok = False + if "description" not in fm: + self.fail("SKILL.md frontmatter missing 'description' field") + ok = False + if ok: + self.passed("SKILL.md has valid frontmatter (name, description)") + + cmds = d / "commands" + if cmds.is_dir(): + for cmd in sorted(cmds.glob("*.md")): + rel = f"commands/{cmd.name}" + fm = self._parse_frontmatter(cmd) + if fm is None: + self.fail(f"{rel} missing YAML frontmatter") + continue + if "name" not in fm: + self.fail(f"{rel} frontmatter missing 'name' field") + continue + name_val = fm["name"].strip('"') + if not name_val.startswith(f"{self.skill_name}:"): + self.fail(f"{rel} name '{name_val}' does not use colon notation '{self.skill_name}:'") + else: + self.passed(f"{rel} has valid frontmatter") + + print() + + def _strip_code_blocks(self, text: str) -> str: + """Remove content inside fenced code blocks.""" + result = [] + in_fence = False + for line in text.split("\n"): + if line.startswith("```"): + in_fence = not in_fence + continue + if not in_fence: + result.append(line) + return "\n".join(result) + + def check_references(self): + print("--- References ---") + + file_contents = {} + for f in self.md_files: + try: + file_contents[f] = f.read_text() + except OSError: + continue + + # Orphaned files + orphan_found = False + for f in self.md_files: + if f.name in EXEMPT_FILES: + continue + target_rel = str(f.relative_to(self.skill_dir)).replace("\\", "/") + target_name = f.name + referenced = False + for other, content in file_contents.items(): + if other == f: + continue + stripped = self._strip_code_blocks(content) + refs = set(REF_PATTERN.findall(stripped)) + if target_rel in refs or any(r.endswith(f"/{target_name}") or r == target_name for r in refs): + referenced = True + break + if not referenced: + rel = f.relative_to(self.skill_dir) + self.fail(f"Orphaned file: {rel} (not referenced by any other file)") + orphan_found = True + + if not orphan_found: + self.passed("No orphaned files found") + + # Dangling references (skip files in exempt dirs — they contain example content) + dangle_found = False + seen_dangles = set() + + for f, content in file_contents.items(): + if any(part in EXEMPT_DIRS for part in f.relative_to(self.skill_dir).parts[:-1]): + continue + stripped = self._strip_code_blocks(content) + refs = set(REF_PATTERN.findall(stripped)) + for ref in sorted(refs): + ref_path = Path(ref) + # Try relative to the file's directory, then to the skill root + candidates = [f.parent / ref_path, self.skill_dir / ref_path] + if ref.startswith("../"): + resolved = (f.parent / ref_path).resolve() + try: + resolved.relative_to(self.skill_dir.resolve()) + candidates = [resolved] + except ValueError: + candidates = [] + + if not any(c.is_file() for c in candidates): + f_rel = f.relative_to(self.skill_dir) + key = (str(f_rel), ref) + if key not in seen_dangles: + seen_dangles.add(key) + self.fail(f"Dangling reference in {f_rel}: {ref} (file does not exist)") + dangle_found = True + + if not dangle_found: + self.passed("No dangling references found") + + print() + + def check_steps(self): + print("--- Steps ---") + + skills_dir = self.skill_dir / "skills" + if not skills_dir.is_dir(): + self.warn("No skills/ directory — skipping step checks") + print() + return + + for skill_file in sorted(skills_dir.glob("*.md")): + rel = f"skills/{skill_file.name}" + try: + text = skill_file.read_text() + except OSError: + continue + + steps = [] + substeps = [] + for line in text.split("\n"): + m = STEP_HEADING.match(line) + if not m: + continue + num = int(m.group(1)) + if m.group(2): + label = line.lstrip("#").strip().split(":")[0].strip() + substeps.append(label) + else: + steps.append(num) + + if substeps: + self.warn(f"{rel} uses sub-step numbering: {', '.join(substeps)}") + + if not steps: + continue + + # Duplicates + unique = sorted(set(steps)) + if len(steps) != len(unique): + dupes = sorted({n for n in steps if steps.count(n) > 1}) + self.fail(f"{rel} has duplicate step number(s): {', '.join(map(str, dupes))}") + + # Gaps + first, last = unique[0], unique[-1] + expected = set(range(first, last + 1)) + missing = sorted(expected - set(unique)) + if missing: + self.fail(f"{rel} has gap(s) in step numbering: missing Step {', '.join(map(str, missing))}") + else: + self.passed(f"{rel} steps sequential ({first}-{last})") + + # Threshold + if len(steps) > STEP_THRESHOLD: + self.warn(f"{rel} has {len(steps)} steps (threshold: {STEP_THRESHOLD})") + + print() + + def run(self): + print(f"=== Pre-Review Automated Checks: {self.skill_name} ===") + print() + self.check_changes() + self.check_structure() + self.check_frontmatter() + self.check_references() + self.check_steps() + total = sum(self.counts.values()) + print("--- Summary ---") + print(f"Checks: {total} | PASS: {self.counts['PASS']} | WARN: {self.counts['WARN']} | FAIL: {self.counts['FAIL']}") + + +def main(): + if len(sys.argv) < 2: + print(f"Usage: {sys.argv[0]} ") + return + + skill_dir = Path(sys.argv[1]).resolve() + if not skill_dir.is_dir(): + print(f"FAIL: Directory does not exist: {sys.argv[1]}") + return + + Checker(skill_dir).run() + + +if __name__ == "__main__": + main() diff --git a/skill-reviewer/skills/review.md b/skill-reviewer/skills/review.md index 94fd640..6d61b33 100644 --- a/skill-reviewer/skills/review.md +++ b/skill-reviewer/skills/review.md @@ -30,10 +30,16 @@ Optional directories (`commands/`, `templates/`, etc.) and files (`guidelines.md `README.md`) are reviewed when present but not required to proceed. If you find non-standard directories, read their contents — they may be relevant findings. -### Step 2: Read All Files +### Step 2: Read All Files and Produce Skill Map -Read **every file** in the skill directory before forming any opinion. Read in -this order to build understanding progressively: +Read every file in the skill directory and produce a structured **skill map** — +an intermediate artifact that captures your understanding before evaluation +begins. The skill map makes the review auditable and catches comprehension +errors before they become wrong findings. + +#### How to read + +Read files in this order to build understanding progressively: 1. `SKILL.md` (orchestrator — tells you the overall structure and routing) 2. `guidelines.md` (principles and constraints) @@ -44,12 +50,76 @@ this order to build understanding progressively: Read each file in full. If any expected file is missing, note it — gaps in the structure are themselves a finding. -**Large skill directories (15+ files):** Read in batches by file type (following -the order above). After each batch, note preliminary observations but defer -judgment until all files are read. This prevents context overload while -preserving the "no opinions before all files" principle. +**Large skill directories (15+ files):** Delegate reading to a sub-agent to +protect your context window. Read `../prompts/analyze-skill.md`, fill in +`{target-skill-directory}` and `{skill-name}`, then: +- **If the AI runtime supports subagents:** Spawn an Explore sub-agent with + the filled-in prompt. +- **If subagents are not available:** Follow the prompt yourself. + +#### How to produce the skill map + +Whether you read directly or via sub-agent, write a skill map to +`.artifacts/skill-reviewer/{skill-name}/skill-map.md` following the structure +defined in `../prompts/analyze-skill.md`. The skill map contains: + +- **File Inventory** — all files with type, line count, and purpose +- **Routing Graph** — which file references which +- **Phase Pipeline** — ordered phases with inputs and outputs +- **Schema Fields** — where introduced, where consumed +- **Step Sequences** — per-file step numbers, cross-references, counts +- **Command Names** — each command with frontmatter name and routing target +- **Key Constraints** — principles, hard limits, safety, quality, escalation + +After writing the skill map, read it back. This is your primary input for +Step 4. You retain the ability to read individual files during evaluation for +deeper inspection — the skill map is a comprehension aid, not a replacement +for targeted file reads when a finding requires verification. + +### Step 3: Run Automated Checks + +Run the pre-review validation script against the target skill directory: + +```bash +python3 {skill-reviewer-dir}/scripts/pre-review-checks.py {target-skill-directory} +``` + +Where `{skill-reviewer-dir}` is the directory this workflow was loaded from (e.g., +`~/.ai-workflows/skill-reviewer`), and `{target-skill-directory}` is the path you +used to read the skill files in Step 2. + +The script performs mechanical checks that can be verified deterministically: + +- **Structure**: required/optional/unexpected files against canonical layout +- **Frontmatter**: YAML validity, required fields, colon notation in commands +- **References**: orphaned files (exist but never referenced) and dangling references (referenced but don't exist) +- **Step sequencing**: sequential numbering, gaps, duplicates, sub-step notation, step count > 10 +- **Change detection**: which files changed, were added, or removed since the last review (based on SHA-256 hashes stored in `.artifacts/skill-reviewer/{skill-name}/file-hashes.json`) + +The script writes current file hashes to `.artifacts/` for future comparisons. + +Review the script output: + +- **FAIL** results are pre-validated against the filesystem. Incorporate them + directly as findings in Step 5 — you do not need to re-verify these. +- **WARN** results need your judgment. Some warnings are genuine findings, others + are acceptable for the skill being reviewed. +- **PASS** results confirm that specific mechanical checks passed. Reference them + as evidence when evaluating the corresponding dimension (e.g., "Automated checks + confirmed no orphaned or dangling references"). + +If the script is not available (e.g., the skill-reviewer was installed without +the `scripts/` directory), skip this step and perform all checks manually in Step 4. + +### Step 4: Evaluate Review Dimensions + +Use the skill map from Step 2 as your primary reference for each dimension. +Drill into specific files only when a dimension requires verification that +the skill map's summary cannot provide. -### Step 3: Evaluate Review Dimensions +If the automated checks in Step 3 reported changed files since a previous +review, prioritize those files and their cross-references. For unchanged files, +verify that previous findings still apply rather than re-evaluating from scratch. Work through each dimension systematically. For each, note any findings. @@ -108,7 +178,7 @@ Work through each dimension systematically. For each, note any findings. - Are failure modes documented (e.g., "If zero results, stop and report")? - Are escalation paths clear? -### Step 4: Classify Findings +### Step 5: Classify Findings Assign a severity to each finding: @@ -117,7 +187,7 @@ Assign a severity to each finding: - **MEDIUM**: Quality issue. Documentation drift, naming inconsistency, missing edge case. - **LOW**: Polish. Readability, minor wording, suggestions for improvement. -### Step 5: Form a Verdict +### Step 6: Form a Verdict Count blockers (CRITICAL + HIGH) and suggestions (MEDIUM + LOW). Determine the overall verdict: @@ -126,10 +196,11 @@ overall verdict: - **Suggestions only** → Skill works but could be improved - **Clean** → Skill is well-structured and ready for use -### Step 6: Report to the User +### Step 7: Report to the User Persist the review report to `.artifacts/skill-reviewer/{skill-name}/review.md`, -then present the same findings inline to the user. +then present the same findings inline to the user. The skill map remains at +`.artifacts/skill-reviewer/{skill-name}/skill-map.md` alongside the review. Use this structure: @@ -186,7 +257,7 @@ when there's an actual problem. If a skill is broken, say so. ## When This Phase Is Done -Your verdict and recommendations (from Step 6) serve as the phase summary. Tell +Your verdict and recommendations (from Step 7) serve as the phase summary. Tell the user where the review was written (`.artifacts/skill-reviewer/{skill-name}/review.md`). The review itself is complete. If the user subsequently asks to fix findings,