feat(writing-plans): add Evaluator section + self-review check by suresh2824 · Pull Request #1627 · obra/superpowers

suresh2824 · 2026-05-24T15:29:30Z

Summary

Adds a top-level ## Evaluator section to the writing-plans skill so every plan declares the deterministic check that gates moving the plan from in-progress to complete. Without one, "done" defaults to vibes — a maintainer reads the implementation, decides it looks right, moves the plan to completed/. Weeks later a regression surfaces with no record of what "passing" was supposed to mean.

This formalizes the Planner → Generator → Evaluator topology that the rest of the superpowers skillset already assumes: the writing-plans Planner produces the spec, executing-plans / subagent-driven-development Generators execute tasks, and the Evaluator returns pass/fail before lifecycle transition.

What's added

The new ## Evaluator section sits between ## Remember and ## Self-Review, enumerating acceptable Evaluator forms so plans in any tech stack can declare a fit-for-purpose check:

Test command (pytest path::test_function, npx playwright test, cargo test <name>, go test ./pkg/...) that exits 0
HTTP assertion (curl -fsS <url> with expected status + body shape)
Database query asserting expected state
verify-* skill invocation (project-local verification)
Rubric pass-list for AI-system or design plans where a deterministic check isn't possible

Also includes a quadruple-backtick example to demonstrate the placement rule (## Evaluator heading above the terminal ## Rollback/sign-off, grep-discoverable across the plan corpus), and adds Self-Review item #4: Evaluator declared so authors verify the section is present before considering the plan finished.

Why submit this

The pattern has been running in a production codebase (mw-vastra retail-ops platform, ~7 weeks) where every non-trivial plan declares an Evaluator. It catches the silent-completion failure mode at the plan-folder lifecycle boundary (partial/ → completed/), which is exactly the place where "looks done" diverges from "is done." Promoting it upstream means every Claude Code user gets the gate without per-project overlay.

Test plan

head skills/writing-plans/SKILL.md — frontmatter intact
Section order verified — new ## Evaluator between ## Remember and ## Self-Review
Markdown fences correct — quadruple-backtick outer + triple-backtick inner for the example (no nested-fence rendering bugs)
No breaking changes — additive only; existing plan-writing workflow unchanged

Source

Pattern originally formalized in mw-vastra PR #488 (project-level rule), promoted upstream here so it's not Vastra-specific. Inspired by Anthropic's agentic-AI design guidance on full-stack Planner → Generator → Evaluator topologies.

🤖 Generated with Claude Code

Adds a top-level `## Evaluator` section to the writing-plans skill so every plan declares the deterministic check that gates moving the plan from in-progress to complete. Without one, "done" defaults to vibes — a maintainer reads the implementation, decides it looks right, and moves the plan to completed. Weeks later a regression surfaces with no record of what "passing" was supposed to mean. This is the Planner → Generator → Evaluator topology that the rest of the superpowers skillset already assumes: the writing-plans Planner produces the spec, executing-plans / subagent-driven-development Generators execute the tasks, and the Evaluator returns pass/fail before lifecycle transition. The new section enumerates acceptable Evaluator forms — test command, HTTP assertion, database query, skill invocation, or rubric pass-list — so plans in any tech stack (pytest, playwright, cargo, go test, curl, DB queries, project-local verify-* skills) can declare a fit-for-purpose check. Placement: between the existing "Remember" and "Self-Review" sections. Self-Review gets a new item obra#4 to verify the Evaluator section is present before the plan is considered complete. Inspired by the Planner → Generator → Evaluator pattern from Anthropic's agentic-AI design guidance. Source content originally lived per-project (mw-vastra/CLAUDE.md + docs/superpowers/plans/README.md); this PR promotes it upstream so every Claude Code user benefits without per-project overlay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

YOMXXX · 2026-05-24T15:47:04Z

Review/triage note:

This is a behavior-shaping change to skills/writing-plans/SKILL.md, not just documentation. Based on recent maintainer feedback on similar skill-content PRs, I think this needs more evidence before line-by-line review is useful.

Specific gaps I would expect reviewers to ask about:

The PR targets main; current contribution flow appears to expect dev.
The PR body does not use the current PR template sections.
The problem statement is general and project-derived. It does not include a concrete Superpowers session where a generated plan lacked a completion evaluator and that caused a real bad outcome.
There is no RED/GREEN behavioral eval showing current writing-plans omits this gate on representative prompts, and that the new wording reliably adds useful evaluator sections without bloating plans or conflicting with the existing per-task verification steps.
The new section adds a mandatory lifecycle rule (may NOT move a plan to completed) but this repo's active workflow docs do not currently define that partial/ -> completed/ lifecycle as a core Superpowers behavior, so this may be importing a project-local convention into core.

The idea may be worth discussing, but I would treat it as a design/eval proposal first, not a ready-to-merge skill prose change.

Ships the plan-evaluator-gate companion plugin for the superpowers:writing-plans skill. Three components: 1. skills/plan-evaluator-gate/SKILL.md — invoked when Claude is about to mark a plan complete. Greps the plan for a top-level `## Evaluator` section with at least one fenced code block; returns BLOCK if absent, PASS otherwise. Mirrors the acceptable-forms list from upstream PR obra/superpowers#1627. 2. hooks/pretooluse-block-partial-move.sh — opt-in PreToolUse hook for teams that want enforcement at the `git mv .../partial/ .../completed/` boundary (independent of Claude's discipline). Exit 2 + advisory stderr on block. Bypass via PLAN_EVALUATOR_GATE_BYPASS=1. 3. tests/test_pretooluse_block_partial_move.sh — 5 smoke subtests covering PASS / no-Evaluator BLOCK / empty-Evaluator BLOCK / non-matching-command pass-through / bypass-env honored. All green. Relationship to obra/superpowers: companion, not replacement. Upstream PR #1627 proposes folding the Evaluator section directly into the writing-plans SKILL.md. If/when that lands, this plugin becomes redundant — users uninstall and rely on the upstream pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(writing-plans): add Evaluator section + self-review check#1627

feat(writing-plans): add Evaluator section + self-review check#1627
suresh2824 wants to merge 1 commit into
obra:mainfrom
suresh2824:feat-writing-plans-evaluator-section

suresh2824 commented May 24, 2026

Uh oh!

YOMXXX commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

suresh2824 commented May 24, 2026

Summary

What's added

Why submit this

Test plan

Source

Uh oh!

YOMXXX commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants