feat(writing-plans): add Evaluator section + self-review check#1627
Open
suresh2824 wants to merge 1 commit into
Open
feat(writing-plans): add Evaluator section + self-review check#1627suresh2824 wants to merge 1 commit into
suresh2824 wants to merge 1 commit into
Conversation
Adds a top-level `## Evaluator` section to the writing-plans skill so every plan declares the deterministic check that gates moving the plan from in-progress to complete. Without one, "done" defaults to vibes — a maintainer reads the implementation, decides it looks right, and moves the plan to completed. Weeks later a regression surfaces with no record of what "passing" was supposed to mean. This is the Planner → Generator → Evaluator topology that the rest of the superpowers skillset already assumes: the writing-plans Planner produces the spec, executing-plans / subagent-driven-development Generators execute the tasks, and the Evaluator returns pass/fail before lifecycle transition. The new section enumerates acceptable Evaluator forms — test command, HTTP assertion, database query, skill invocation, or rubric pass-list — so plans in any tech stack (pytest, playwright, cargo, go test, curl, DB queries, project-local verify-* skills) can declare a fit-for-purpose check. Placement: between the existing "Remember" and "Self-Review" sections. Self-Review gets a new item obra#4 to verify the Evaluator section is present before the plan is considered complete. Inspired by the Planner → Generator → Evaluator pattern from Anthropic's agentic-AI design guidance. Source content originally lived per-project (mw-vastra/CLAUDE.md + docs/superpowers/plans/README.md); this PR promotes it upstream so every Claude Code user benefits without per-project overlay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Review/triage note: This is a behavior-shaping change to Specific gaps I would expect reviewers to ask about:
The idea may be worth discussing, but I would treat it as a design/eval proposal first, not a ready-to-merge skill prose change. |
suresh2824
added a commit
to suresh2824/plan-evaluator-gate
that referenced
this pull request
May 24, 2026
Ships the plan-evaluator-gate companion plugin for the superpowers:writing-plans skill. Three components: 1. skills/plan-evaluator-gate/SKILL.md — invoked when Claude is about to mark a plan complete. Greps the plan for a top-level `## Evaluator` section with at least one fenced code block; returns BLOCK if absent, PASS otherwise. Mirrors the acceptable-forms list from upstream PR obra/superpowers#1627. 2. hooks/pretooluse-block-partial-move.sh — opt-in PreToolUse hook for teams that want enforcement at the `git mv .../partial/ .../completed/` boundary (independent of Claude's discipline). Exit 2 + advisory stderr on block. Bypass via PLAN_EVALUATOR_GATE_BYPASS=1. 3. tests/test_pretooluse_block_partial_move.sh — 5 smoke subtests covering PASS / no-Evaluator BLOCK / empty-Evaluator BLOCK / non-matching-command pass-through / bypass-env honored. All green. Relationship to obra/superpowers: companion, not replacement. Upstream PR #1627 proposes folding the Evaluator section directly into the writing-plans SKILL.md. If/when that lands, this plugin becomes redundant — users uninstall and rely on the upstream pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a top-level
## Evaluatorsection to thewriting-plansskill so every plan declares the deterministic check that gates moving the plan from in-progress to complete. Without one, "done" defaults to vibes — a maintainer reads the implementation, decides it looks right, moves the plan tocompleted/. Weeks later a regression surfaces with no record of what "passing" was supposed to mean.This formalizes the Planner → Generator → Evaluator topology that the rest of the superpowers skillset already assumes: the
writing-plansPlanner produces the spec,executing-plans/subagent-driven-developmentGenerators execute tasks, and the Evaluator returns pass/fail before lifecycle transition.What's added
The new
## Evaluatorsection sits between## Rememberand## Self-Review, enumerating acceptable Evaluator forms so plans in any tech stack can declare a fit-for-purpose check:pytest path::test_function,npx playwright test,cargo test <name>,go test ./pkg/...) that exits 0curl -fsS <url>with expected status + body shape)verify-*skill invocation (project-local verification)Also includes a quadruple-backtick example to demonstrate the placement rule (
## Evaluatorheading above the terminal## Rollback/sign-off, grep-discoverable across the plan corpus), and adds Self-Review item #4: Evaluator declared so authors verify the section is present before considering the plan finished.Why submit this
The pattern has been running in a production codebase (mw-vastra retail-ops platform, ~7 weeks) where every non-trivial plan declares an Evaluator. It catches the silent-completion failure mode at the plan-folder lifecycle boundary (
partial/→completed/), which is exactly the place where "looks done" diverges from "is done." Promoting it upstream means every Claude Code user gets the gate without per-project overlay.Test plan
head skills/writing-plans/SKILL.md— frontmatter intact## Evaluatorbetween## Rememberand## Self-ReviewSource
Pattern originally formalized in mw-vastra PR #488 (project-level rule), promoted upstream here so it's not Vastra-specific. Inspired by Anthropic's agentic-AI design guidance on full-stack Planner → Generator → Evaluator topologies.
🤖 Generated with Claude Code