Skip to content

feat(writing-plans): add Evaluator section + self-review check#1627

Open
suresh2824 wants to merge 1 commit into
obra:mainfrom
suresh2824:feat-writing-plans-evaluator-section
Open

feat(writing-plans): add Evaluator section + self-review check#1627
suresh2824 wants to merge 1 commit into
obra:mainfrom
suresh2824:feat-writing-plans-evaluator-section

Conversation

@suresh2824
Copy link
Copy Markdown

Summary

Adds a top-level ## Evaluator section to the writing-plans skill so every plan declares the deterministic check that gates moving the plan from in-progress to complete. Without one, "done" defaults to vibes — a maintainer reads the implementation, decides it looks right, moves the plan to completed/. Weeks later a regression surfaces with no record of what "passing" was supposed to mean.

This formalizes the Planner → Generator → Evaluator topology that the rest of the superpowers skillset already assumes: the writing-plans Planner produces the spec, executing-plans / subagent-driven-development Generators execute tasks, and the Evaluator returns pass/fail before lifecycle transition.

What's added

The new ## Evaluator section sits between ## Remember and ## Self-Review, enumerating acceptable Evaluator forms so plans in any tech stack can declare a fit-for-purpose check:

  • Test command (pytest path::test_function, npx playwright test, cargo test <name>, go test ./pkg/...) that exits 0
  • HTTP assertion (curl -fsS <url> with expected status + body shape)
  • Database query asserting expected state
  • verify-* skill invocation (project-local verification)
  • Rubric pass-list for AI-system or design plans where a deterministic check isn't possible

Also includes a quadruple-backtick example to demonstrate the placement rule (## Evaluator heading above the terminal ## Rollback/sign-off, grep-discoverable across the plan corpus), and adds Self-Review item #4: Evaluator declared so authors verify the section is present before considering the plan finished.

Why submit this

The pattern has been running in a production codebase (mw-vastra retail-ops platform, ~7 weeks) where every non-trivial plan declares an Evaluator. It catches the silent-completion failure mode at the plan-folder lifecycle boundary (partial/completed/), which is exactly the place where "looks done" diverges from "is done." Promoting it upstream means every Claude Code user gets the gate without per-project overlay.

Test plan

  • head skills/writing-plans/SKILL.md — frontmatter intact
  • Section order verified — new ## Evaluator between ## Remember and ## Self-Review
  • Markdown fences correct — quadruple-backtick outer + triple-backtick inner for the example (no nested-fence rendering bugs)
  • No breaking changes — additive only; existing plan-writing workflow unchanged

Source

Pattern originally formalized in mw-vastra PR #488 (project-level rule), promoted upstream here so it's not Vastra-specific. Inspired by Anthropic's agentic-AI design guidance on full-stack Planner → Generator → Evaluator topologies.

🤖 Generated with Claude Code

Adds a top-level `## Evaluator` section to the writing-plans skill so every
plan declares the deterministic check that gates moving the plan from
in-progress to complete. Without one, "done" defaults to vibes — a maintainer
reads the implementation, decides it looks right, and moves the plan to
completed. Weeks later a regression surfaces with no record of what
"passing" was supposed to mean.

This is the Planner → Generator → Evaluator topology that the rest of the
superpowers skillset already assumes: the writing-plans Planner produces the
spec, executing-plans / subagent-driven-development Generators execute the
tasks, and the Evaluator returns pass/fail before lifecycle transition.

The new section enumerates acceptable Evaluator forms — test command, HTTP
assertion, database query, skill invocation, or rubric pass-list — so plans
in any tech stack (pytest, playwright, cargo, go test, curl, DB queries,
project-local verify-* skills) can declare a fit-for-purpose check.

Placement: between the existing "Remember" and "Self-Review" sections.
Self-Review gets a new item obra#4 to verify the Evaluator section is present
before the plan is considered complete.

Inspired by the Planner → Generator → Evaluator pattern from Anthropic's
agentic-AI design guidance. Source content originally lived per-project
(mw-vastra/CLAUDE.md + docs/superpowers/plans/README.md); this PR promotes
it upstream so every Claude Code user benefits without per-project overlay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@YOMXXX
Copy link
Copy Markdown

YOMXXX commented May 24, 2026

Review/triage note:

This is a behavior-shaping change to skills/writing-plans/SKILL.md, not just documentation. Based on recent maintainer feedback on similar skill-content PRs, I think this needs more evidence before line-by-line review is useful.

Specific gaps I would expect reviewers to ask about:

  • The PR targets main; current contribution flow appears to expect dev.
  • The PR body does not use the current PR template sections.
  • The problem statement is general and project-derived. It does not include a concrete Superpowers session where a generated plan lacked a completion evaluator and that caused a real bad outcome.
  • There is no RED/GREEN behavioral eval showing current writing-plans omits this gate on representative prompts, and that the new wording reliably adds useful evaluator sections without bloating plans or conflicting with the existing per-task verification steps.
  • The new section adds a mandatory lifecycle rule (may NOT move a plan to completed) but this repo's active workflow docs do not currently define that partial/ -> completed/ lifecycle as a core Superpowers behavior, so this may be importing a project-local convention into core.

The idea may be worth discussing, but I would treat it as a design/eval proposal first, not a ready-to-merge skill prose change.

suresh2824 added a commit to suresh2824/plan-evaluator-gate that referenced this pull request May 24, 2026
Ships the plan-evaluator-gate companion plugin for the
superpowers:writing-plans skill. Three components:

1. skills/plan-evaluator-gate/SKILL.md — invoked when Claude is about to
   mark a plan complete. Greps the plan for a top-level `## Evaluator`
   section with at least one fenced code block; returns BLOCK if absent,
   PASS otherwise. Mirrors the acceptable-forms list from upstream PR
   obra/superpowers#1627.

2. hooks/pretooluse-block-partial-move.sh — opt-in PreToolUse hook for
   teams that want enforcement at the `git mv .../partial/ .../completed/`
   boundary (independent of Claude's discipline). Exit 2 + advisory
   stderr on block. Bypass via PLAN_EVALUATOR_GATE_BYPASS=1.

3. tests/test_pretooluse_block_partial_move.sh — 5 smoke subtests
   covering PASS / no-Evaluator BLOCK / empty-Evaluator BLOCK /
   non-matching-command pass-through / bypass-env honored. All green.

Relationship to obra/superpowers: companion, not replacement. Upstream
PR #1627 proposes folding the Evaluator section directly into the
writing-plans SKILL.md. If/when that lands, this plugin becomes
redundant — users uninstall and rely on the upstream pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants