Skill reviewer enhancements#38
Conversation
Offloads verifiable work to scripts instead of trusting the LLM. Several of the review dimensions have mechanically checkable aspects: - Orphaned references: grep SKILL.md for file paths, diff against actual files on disk - Step numbering gaps: parse skills/*.md for ## Step N headings, check for gaps/duplicates - Schema field consistency: extract field names from producer files, check they appear in consumer files - README vs implementation drift: extract feature claims from README, grep for them in skills/ A scripts/ directory with small shell/python checkers could run before the LLM review, feeding concrete findings into the evaluation rather than relying on the LLM to notice them. The LLM then focuses on judgment calls (clarity, cognitive load, edge cases) where it adds real value.
Separates analyze-skill.md prompt into its own file and delegates deep-reading to an Explore agent. This skill currently reads everything in-context. For large skills (15+ files — you already flag this case), delegating the initial structured read to a sub-agent with a dedicated prompt template would: - Protect context window - Produce a structured intermediate artifact (like eval-analyze's YAML analysis block) - Let the reviewer LLM focus on evaluation rather than comprehension
- Computes SHA-256 hashes of all .md files in the target skill
- Compares against stored hashes in .artifacts/skill-reviewer/{skill-name}/file-hashes.json
- Reports changed, new, removed, and unchanged files
- Writes current hashes for next comparison
Produces a structured YAML analysis (purpose, inputs, outputs, pipeline, quality criteria) before generating the final config.
Current reviewer jumps from reading directly to evaluating. Adding an intermediate "skill map" step — a structured summary of what was read (routing graph, schema fields, step sequence, file inventory) — would:
- Make the review auditable ("here's what I understood before judging")
- Catch comprehension errors before they become wrong findings
- Serve as the input to both automated checks and LLM evaluation
WalkthroughAdds an automated pre-review checks script and an Explore-agent prompt for large-skill analysis, updates workflow steps to generate and consume a persisted skill-map and file-hashes artifacts, and documents the new dirs/conventions across AGENTS.md, CONTRIBUTING.md, and multiple skill-reviewer docs. ChangesSkill-Reviewer: pre-review automation & skill-map
Sequence Diagram(s)sequenceDiagram
participant Invoker
participant Workflow
participant ExploreAgent
participant PreReviewScript
participant ArtifactStore
participant ReviewerLLM
Invoker->>Workflow: start skill-review for <skill-dir>
Workflow->>Workflow: detect size (>15 files)?
alt large
Workflow->>ExploreAgent: run prompt `analyze-skill.md` on <skill-dir>
ExploreAgent->>ArtifactStore: write `.artifacts/.../skill-map.md`
else small
Workflow->>Workflow: inline read files -> generate in-memory skill-map
Workflow->>ArtifactStore: optionally write `.artifacts/.../skill-map.md`
end
Workflow->>PreReviewScript: run `pre-review-checks.py <skill-dir>`
PreReviewScript->>ArtifactStore: write `.artifacts/.../file-hashes.json`
PreReviewScript->>Workflow: return PASS/WARN/FAIL (info only)
Workflow->>ReviewerLLM: provide skill-map + pre-review outputs + files
ReviewerLLM->>ArtifactStore: write `.artifacts/.../review.md`
Workflow->>Invoker: complete, artifacts written (skill-map, file-hashes, review.md)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
skill-reviewer/SKILL.md (1)
11-66: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
SKILL.mdshould stay thin and under 30 lines; detailed flow belongs inskills/.This file currently embeds detailed procedural content that exceeds the SKILL.md size rule.
As per coding guidelines, “Keep SKILL.md under 30 lines to follow progressive disclosure pattern, with detailed guidance living in guidelines.md and skills/ directory.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skill-reviewer/SKILL.md` around lines 11 - 66, SKILL.md is too long (exceeds the 30-line rule) and embeds detailed procedural content that belongs in the skills and guidelines files; trim SKILL.md to a short header/one-paragraph Quick Start plus a pointer to the detailed workflow, then move the step-by-step procedure (steps 1–6), the 8-dimension checklist, and the File Layout block into the detailed skill document (e.g., skills/review.md) and policy guidance (guidelines.md); ensure commands/review.md or prompts/analyze-skill.md reference the detailed locations, and leave only a concise executable hint in SKILL.md so the file is under 30 lines.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@AGENTS.md`:
- Around line 349-351: Update the repository tree in AGENTS.md so the
skill-reviewer section includes the missing prompts/ directory alongside
scripts/ (i.e., change the snippet under the skill-reviewer node from only
"scripts/" to include both "prompts/" and "scripts/") to match the workflow
layout described elsewhere; ensure the tree shows "prompts/" as a child of
"skill-reviewer/" for consistency with the rest of the document.
In `@CONTRIBUTING.md`:
- Around line 104-115: Update the CONTRIBUTING.md examples that show absolute
paths: replace the two occurrences of "~/.ai-workflows/workflow-name/scripts/"
and "~/.ai-workflows/workflow-name/prompts/" with symlink-safe relative examples
such as "./scripts/" and "./prompts/" (or "workflow-name/scripts/" and
"workflow-name/prompts/") so the "Scripts" and "Prompts" sections use relative
paths; ensure the text around those examples (the bullets under "Scripts" and
the last bullet under "Prompts") remains consistent with the repo guideline to
use relative paths for symlink compatibility.
In `@skill-reviewer/scripts/pre-review-checks.py`:
- Around line 255-259: When handling "../" refs the code currently accepts
(f.parent / ref_path).resolve() as a candidate even if it lies outside the skill
root; change the logic that builds candidates for ref.startswith("../") so that
after resolving the candidate you verify it is contained within self.skill_dir
(use self.skill_dir.resolve() and Path.is_relative_to or try
candidate.relative_to(self.skill_dir.resolve())) and only include it in
candidates if that containment check passes; keep the subsequent
any(c.is_file()) and f.relative_to(self.skill_dir) logic unchanged so external
files are not mistakenly treated as valid.
- Around line 229-236: The orphan detection uses a substring check (f.name in
content) which causes false matches; update the referenced computation to
perform reference-aware matching by checking exact filename tokens and common
reference forms: test for the file's full relative path
(str(f.relative_to(self.skill_dir))), the filename stem (f.stem) and filename
(f.name) with word-boundary/regex matches, and common import/asset patterns like
"./{name}", "/{rel_path}", "{rel_path}", "from .{stem}", or "{stem}.py" using
re.search to avoid substring collisions; replace the current any(...) loop that
uses file_contents and f.name with this regex/token-aware check and keep calling
self.fail(rel) when no reference is found.
In `@skill-reviewer/SKILL.md`:
- Around line 16-17: Replace the brittle example command string `python3
scripts/pre-review-checks.py {target-dir}` in SKILL.md with a
repository-root-safe invocation and guidance: recommend running the script using
a path that forces a relative lookup (e.g., prefix the script with ./) or
instruct users to run it from the repository root or via a module invocation
(python -m ...) so the pre-check works regardless of the current working
directory; update the wording around the example to show the safer invocation
and mention the alternative of changing into the repo root before running.
---
Outside diff comments:
In `@skill-reviewer/SKILL.md`:
- Around line 11-66: SKILL.md is too long (exceeds the 30-line rule) and embeds
detailed procedural content that belongs in the skills and guidelines files;
trim SKILL.md to a short header/one-paragraph Quick Start plus a pointer to the
detailed workflow, then move the step-by-step procedure (steps 1–6), the
8-dimension checklist, and the File Layout block into the detailed skill
document (e.g., skills/review.md) and policy guidance (guidelines.md); ensure
commands/review.md or prompts/analyze-skill.md reference the detailed locations,
and leave only a concise executable hint in SKILL.md so the file is under 30
lines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2e8489e8-f4af-4666-bacb-8d92d09cb95e
📒 Files selected for processing (8)
AGENTS.mdCONTRIBUTING.mdskill-reviewer/README.mdskill-reviewer/SKILL.mdskill-reviewer/guidelines.mdskill-reviewer/prompts/analyze-skill.mdskill-reviewer/scripts/pre-review-checks.pyskill-reviewer/skills/review.md
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
skill-reviewer/SKILL.md (2)
9-17:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
SKILL.mddoes not referenceguidelines.md.The file contains no link to
guidelines.md, but the coding guideline requires it. Adding a single line (e.g.,For principles and hard limits, see [guidelines.md](guidelines.md).) satisfies the rule and also helps the LLM load behavioral rules without re-reading the full document.As per coding guidelines: "
**/SKILL.md: SKILL.md must reference guidelines.md and optionally skills/controller.md using relative paths from the same directory".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skill-reviewer/SKILL.md` around lines 9 - 17, SKILL.md currently lacks the required reference to guidelines.md; add a single relative-link line in SKILL.md (e.g., near the top or under "Quick Start") that reads something like: For principles and hard limits, see [guidelines.md](guidelines.md). Ensure the link uses the relative path "guidelines.md" from SKILL.md and optionally mention skills/controller.md if relevant so the file meets the guideline that SKILL.md must reference guidelines.md (and may reference skills/controller.md).
1-66:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
SKILL.mdexceeds the mandatory 30-line limit at 66 lines.The report output template (lines 30–49) and the file-layout diagram (lines 51–66) add ~36 lines of detail that belong in
skills/review.mdorguidelines.md. SKILL.md should only contain a thin overview per the progressive-disclosure principle.🛠️ Suggested slim-down
Move the full report template and file-layout diagram to
skills/review.md(orguidelines.md), then reduce SKILL.md to just the numbered steps with a pointer:-4. Classify each finding by severity — **CRITICAL** / **HIGH** (blockers) or **MEDIUM** / **LOW** (suggestions). -5. Validate findings: verify each finding cites a specific file, includes a concrete suggestion, and that blocker/suggestion counts are accurate. Drop any finding you cannot substantiate from the files you read. -6. Produce a structured report and write it to `.artifacts/skill-reviewer/{skill-name}/review.md`: - -``` -## Skill Review: {skill-name} -... -``` +4. Classify, validate, and produce a report — see `skills/review.md` for the full output format and validation rules.And trim or replace the File Layout block with a one-line pointer to
README.md.As per coding guidelines: "
**/SKILL.md: SKILL.md must be under 30 lines to maintain progressive disclosure (details live in guidelines.md and skills/)".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skill-reviewer/SKILL.md` around lines 1 - 66, SKILL.md currently exceeds the 30-line limit by including the full report template and File Layout block; trim SKILL.md to a concise overview by removing the detailed "## Skill Review: {skill-name}" report template and the File Layout diagram, move those details into skills/review.md (or guidelines.md) and update the numbered steps (specifically step 4) to point to skills/review.md for the full output format and validation rules, and replace the File Layout block with a single-line pointer to README.md so SKILL.md remains under 30 lines.
🧹 Nitpick comments (1)
skill-reviewer/scripts/pre-review-checks.py (1)
224-246: ⚡ Quick winEXEMPT_DIRS asymmetry is a defensive improvement, not a current issue.
The orphan check does not mirror
EXEMPT_DIRS(unlike the dangling-reference check), butprompts/analyze-skill.mdis currently NOT flagged as orphaned becauseskills/review.mdreferences it as../prompts/analyze-skill.mdin non-fenced text, which matches the reference pattern.While the structural asymmetry could theoretically cause false positives if a new prompt file is added without references, it is not currently a problem. Applying the suggested fix would be a good defensive refactor for consistency and future-proofing.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skill-reviewer/scripts/pre-review-checks.py` around lines 224 - 246, The orphaned-file loop currently only skips EXEMPT_FILES but not directories in EXEMPT_DIRS; update the check in the loop that iterates self.md_files (the block using target_rel, target_name, REF_PATTERN and self._strip_code_blocks) to also skip any file whose path is inside an exempt directory (e.g., compute rel = f.relative_to(self.skill_dir) and if rel.parts[0] in EXEMPT_DIRS continue), mirroring the dangling-reference check behavior so files under EXEMPT_DIRS are not treated as orphaned.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@skill-reviewer/SKILL.md`:
- Around line 9-17: SKILL.md currently lacks the required reference to
guidelines.md; add a single relative-link line in SKILL.md (e.g., near the top
or under "Quick Start") that reads something like: For principles and hard
limits, see [guidelines.md](guidelines.md). Ensure the link uses the relative
path "guidelines.md" from SKILL.md and optionally mention skills/controller.md
if relevant so the file meets the guideline that SKILL.md must reference
guidelines.md (and may reference skills/controller.md).
- Around line 1-66: SKILL.md currently exceeds the 30-line limit by including
the full report template and File Layout block; trim SKILL.md to a concise
overview by removing the detailed "## Skill Review: {skill-name}" report
template and the File Layout diagram, move those details into skills/review.md
(or guidelines.md) and update the numbered steps (specifically step 4) to point
to skills/review.md for the full output format and validation rules, and replace
the File Layout block with a single-line pointer to README.md so SKILL.md
remains under 30 lines.
---
Nitpick comments:
In `@skill-reviewer/scripts/pre-review-checks.py`:
- Around line 224-246: The orphaned-file loop currently only skips EXEMPT_FILES
but not directories in EXEMPT_DIRS; update the check in the loop that iterates
self.md_files (the block using target_rel, target_name, REF_PATTERN and
self._strip_code_blocks) to also skip any file whose path is inside an exempt
directory (e.g., compute rel = f.relative_to(self.skill_dir) and if rel.parts[0]
in EXEMPT_DIRS continue), mirroring the dangling-reference check behavior so
files under EXEMPT_DIRS are not treated as orphaned.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7f47a704-a275-428c-a51e-28bf00719f84
📒 Files selected for processing (4)
AGENTS.mdCONTRIBUTING.mdskill-reviewer/SKILL.mdskill-reviewer/scripts/pre-review-checks.py
This pull request enhances the documentation and structure of the
skill-reviewerworkflow to clarify the use of deterministic scripts and prompt templates, and to better separate automated and judgment-based review steps. It introduces new directories for scripts and prompts, updates workflow documentation to reflect these additions, and details how automated checks and sub-agent delegation work in the review process.Key updates to workflow structure and documentation:
Workflow directory structure and conventions:
scripts/andprompts/directories to the recommended workflow layout inAGENTS.md, explaining their roles in deterministic operations and sub-agent delegation.CONTRIBUTING.mdto document conventions forscripts/(deterministic, code-based checks) andprompts/(templates for delegated sub-agent tasks), including usage patterns and language requirements.skill-reviewer workflow enhancements:
scripts/pre-review-checks.pyfor automated structural, frontmatter, reference, and sequencing checks before LLM evaluation, and documented its invocation and output (file-hashes.json).prompts/analyze-skill.mdas a prompt template for an Explore sub-agent to generate a structured skill map for large skills, with detailed instructions and output format.README.md,SKILL.md,guidelines.md) to describe the revised review process, clarifying the order of automated checks, sub-agent delegation, and manual evaluation, as well as how findings are classified and validated.Artifacts and outputs:
file-hashes.jsonandskill-map.mdalongside the review report, and updated the documentation to reflect these new outputs.These changes clarify the boundaries between deterministic (scripted) checks and LLM-driven evaluation, improve the scalability of the review process for large skills, and provide clear guidance and templates for future workflow development.
Summary by CodeRabbit
New Features
Documentation