From bc39362d327b1d95b41a4f08cab4ef25d0a4f1c5 Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Wed, 18 Mar 2026 05:32:42 +0800
Subject: [PATCH 01/17] feat: generalize skill from web-only to any project
 type
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Rename iterative-web-dev to iterative-dev with support for web, API,
CLI, library, data pipeline, and mobile projects. The core autonomous
loop is unchanged — only verification strategy and applicable standards
swap based on a "type" field in feature_list.json.

- Restructure references/ into core/, web/, and verification/ subdirs
- Add 6 type-specific verification strategies (web, api, cli, library, data, mobile)
- Update feature_list.json format with top-level type field and type-specific categories
- Make subagent template, post-verification checks, and decision guidelines type-aware
- Expand session-handoff and init-script docs to cover Go, Python, Rust, Node.js
- Add MIT license
- Bump to v2.0.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 LICENSE                                       |   2 +-
 README.md                                     |  92 ++++
 SKILL.md                                      | 459 ++++++++++++++++++
 package.json                                  |  28 ++
 references/core/code-quality.md               |  48 ++
 references/core/constitution-audit.md         | 131 +++++
 references/core/continue-workflow.md          | 161 ++++++
 references/core/feature-list-format.md        | 214 ++++++++
 references/core/gitignore-standards.md        |  58 +++
 references/core/init-script-template.md       | 247 ++++++++++
 references/core/session-handoff-standards.md  |  79 +++
 references/verification/api-verification.md   | 145 ++++++
 references/verification/cli-verification.md   | 163 +++++++
 references/verification/data-verification.md  | 177 +++++++
 .../verification/library-verification.md      | 190 ++++++++
 .../verification/mobile-verification.md       | 158 ++++++
 references/verification/web-verification.md   | 148 ++++++
 references/web/e2e-verification.md            | 281 +++++++++++
 references/web/frontend-design.md             |  76 +++
 references/web/ux-standards.md                | 126 +++++
 20 files changed, 2982 insertions(+), 1 deletion(-)
 create mode 100644 README.md
 create mode 100644 SKILL.md
 create mode 100644 package.json
 create mode 100644 references/core/code-quality.md
 create mode 100644 references/core/constitution-audit.md
 create mode 100644 references/core/continue-workflow.md
 create mode 100644 references/core/feature-list-format.md
 create mode 100644 references/core/gitignore-standards.md
 create mode 100644 references/core/init-script-template.md
 create mode 100644 references/core/session-handoff-standards.md
 create mode 100644 references/verification/api-verification.md
 create mode 100644 references/verification/cli-verification.md
 create mode 100644 references/verification/data-verification.md
 create mode 100644 references/verification/library-verification.md
 create mode 100644 references/verification/mobile-verification.md
 create mode 100644 references/verification/web-verification.md
 create mode 100644 references/web/e2e-verification.md
 create mode 100644 references/web/frontend-design.md
 create mode 100644 references/web/ux-standards.md

diff --git a/LICENSE b/LICENSE
index 276ed3b..9f91ba9 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2026 The Plant
+Copyright (c) 2025 The Plant
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..108f447
--- /dev/null
+++ b/README.md
@@ -0,0 +1,92 @@
+# iterative-dev
+
+An AI skill for iterative development with AI agents. Works with **any project type** — web apps, APIs, CLI tools, libraries, data pipelines, and mobile apps. Supports **Claude Code** (with subagents) and **Windsurf**.
+
+## Installation
+
+```bash
+npx skills add https://github.com/sunfmin/iterative-web-dev
+```
+
+## Overview
+
+This skill provides a complete workflow for AI agents working on long-running development projects across multiple sessions. It ensures **incremental, reliable progress** with proper handoffs between sessions.
+
+### Supported Project Types
+
+| Type | Verification Strategy |
+|------|----------------------|
+| **web** | Playwright E2E tests + screenshot visual review |
+| **api** | Integration tests + endpoint/response validation |
+| **cli** | Command execution tests + output/exit code validation |
+| **library** | Unit tests + public API surface validation |
+| **data** | Transformation tests + schema/data quality checks |
+| **mobile** | Mobile E2E tests (Detox/XCTest/Flutter) + screenshot review |
+
+### Claude Code Features
+
+- **Subagent per feature** — Each feature is implemented in its own subagent using the Agent tool, keeping context clean and isolated
+- **Autonomous loop** — The agent keeps working through ALL features without stopping, even if the human is away
+- **Self-directed decisions** — Handles ambiguity, errors, and blockers autonomously using decision-making guidelines
+- **Commit after each feature** — Every completed feature is committed independently for clean git history
+- **Type-aware verification** — Automatically uses the right verification strategy for your project type
+
+## Core Principles
+
+1. **Incremental progress** — Work on ONE feature at a time. Finish, test, and commit before moving on.
+2. **Feature list is sacred** — `feature_list.json` is the single source of truth.
+3. **Git discipline** — Commit after every completed feature.
+4. **Clean handoffs** — Every session ends with committed work and updated progress notes.
+5. **Test before build** — Verify existing features work before implementing new ones.
+6. **Autonomous execution** — Make all decisions yourself. Never stop to ask the human.
+7. **Subagent isolation** — Each feature runs in its own subagent for clean context.
+
+## Workflows
+
+| Workflow | Use When |
+|----------|----------|
+| **init-scope** | Starting a new scope, switching scopes, or setting up project structure |
+| **continue** | Every session after init — implements ALL remaining features with verification built in |
+
+## Key Files
+
+- `spec.md` — Project specification (symlink to active scope)
+- `feature_list.json` — Feature tracking with pass/fail status and project type
+- `progress.txt` — Session progress log
+- `init.sh` — Development environment setup script
+
+## How It Works (Claude Code)
+
+1. Agent reads `feature_list.json` to find incomplete features and project type
+2. For each feature, launches a **subagent** (via Agent tool) with full context
+3. Subagent implements the feature, runs type-appropriate verification, and commits
+4. Parent agent confirms completion, then **loops back** to pick the next feature
+5. Only stops when ALL features have `"passes": true`
+
+## Project Structure
+
+```
+references/
+├── core/                    # All project types
+│   ├── code-quality.md
+│   ├── gitignore-standards.md
+│   ├── feature-list-format.md
+│   ├── session-handoff-standards.md
+│   ├── constitution-audit.md
+│   ├── init-script-template.md
+│   └── continue-workflow.md
+├── web/                     # Web and mobile projects
+│   ├── ux-standards.md
+│   └── frontend-design.md
+└── verification/            # One per project type
+    ├── web-verification.md
+    ├── api-verification.md
+    ├── cli-verification.md
+    ├── library-verification.md
+    ├── data-verification.md
+    └── mobile-verification.md
+```
+
+## License
+
+MIT
diff --git a/SKILL.md b/SKILL.md
new file mode 100644
index 0000000..c56d5b6
--- /dev/null
+++ b/SKILL.md
@@ -0,0 +1,459 @@
+---
+name: iterative-dev
+description: Manage long-running AI agent development projects with incremental progress, scoped features, and verification. Works with any project type — web, API, CLI, library, data pipeline, mobile. Use this skill when working on multi-session projects, implementing features incrementally, running tests, initializing project scopes, or continuing work from previous sessions. Triggers on phrases like "continue working", "pick up where I left off", "next feature", "run tests", "verify", "initialize scope", "switch scope", "feature list", "incremental progress", or any multi-session development workflow.
+---
+
+# Iterative Development Workflow
+
+This skill provides a complete workflow for AI agents working on long-running development projects across multiple sessions. It ensures **incremental, reliable progress** with proper handoffs between sessions. It works with any project type — web apps, APIs, CLI tools, libraries, data pipelines, and mobile apps.
+
+## Core Principles
+
+1. **Incremental progress** — Work on ONE feature at a time. Finish, test, and commit before moving on.
+2. **Feature list is sacred** — `feature_list.json` is the single source of truth. See `references/core/feature-list-format.md` for rules.
+3. **Git discipline** — Commit after every completed feature. Never leave uncommitted work.
+4. **Clean handoffs** — Every session ends meeting `references/core/session-handoff-standards.md`.
+5. **Test before build** — Verify existing features work before implementing new ones.
+6. **Autonomous execution** — Make all decisions yourself. Never stop to ask the human. The human may be asleep.
+7. **Subagent per feature** — Each feature is implemented in its own subagent for isolation and parallelism safety.
+8. **Refactor and unit test** — Actively extract logic into testable modules. See `references/core/code-quality.md`.
+9. **Verification is non-negotiable** — Every feature MUST be verified using the strategy for its project type. See `references/verification/`.
+10. **Standards are auditable** — Quality standards live in reference docs and are systematically verified, not just aspirational checklists.
+
+## Project Types
+
+The skill adapts its verification strategy and applicable standards based on project type. The type is declared in `feature_list.json` or auto-detected during scope init.
+
+| Type | Verification Strategy | Extra Standards |
+|------|----------------------|-----------------|
+| **web** | E2E screenshots + visual review (Playwright) | `web/ux-standards.md`, `web/frontend-design.md` |
+| **api** | Integration tests, endpoint validation, response schemas | — |
+| **cli** | Command execution tests, output validation, exit codes | — |
+| **library** | Unit tests, public API validation, type checking | — |
+| **data** | Transformation tests, schema validation, data quality checks | — |
+| **mobile** | E2E screenshots + visual review (Detox/XCTest/Flutter) | `web/ux-standards.md` (adapted) |
+
+## Standards Documents
+
+All verifiable quality standards are extracted into reference docs. These are used both as guidance during implementation and as audit targets for systematic verification.
+
+### Core Standards (all project types)
+
+| Document | What it covers |
+|----------|---------------|
+| `references/core/code-quality.md` | File organization, testable architecture, unit testing, no duplication |
+| `references/core/gitignore-standards.md` | Files that must never be committed |
+| `references/core/feature-list-format.md` | Feature list structure, critical rules, priority order |
+| `references/core/session-handoff-standards.md` | Clean codebase, git state, progress tracking — verified at session end |
+
+### Web-Specific Standards (type: web, mobile)
+
+| Document | What it covers |
+|----------|---------------|
+| `references/web/ux-standards.md` | Loading/empty/error states, responsive design, accessibility, forms, tables, navigation |
+| `references/web/frontend-design.md` | Typography, color, spatial composition, micro-interactions, anti-patterns |
+
+### Verification Strategies (one per project type)
+
+| Document | For type |
+|----------|----------|
+| `references/verification/web-verification.md` | web |
+| `references/verification/api-verification.md` | api |
+| `references/verification/cli-verification.md` | cli |
+| `references/verification/library-verification.md` | library |
+| `references/verification/data-verification.md` | data |
+| `references/verification/mobile-verification.md` | mobile |
+
+## When to Use Each Workflow
+
+| Workflow | Use When |
+|----------|----------|
+| **init-scope** | Starting a new scope, switching scopes, or setting up project structure |
+| **continue** | Every session after init — picking up work, implementing ALL remaining features, and verifying each |
+
+---
+
+## Workflow: Initialize Scope
+
+Use this to create a new development scope or switch between existing scopes.
+
+### Concepts
+
+- **Scope**: A focused set of features (e.g., "auth", "video-editor", "phase-2")
+- **Active Scope**: Currently active scope stored in `.active-scope`
+- **Scope Files**: `specs/{scope}/spec.md` and `specs/{scope}/feature_list.json`
+- **Project Type**: Declared in `feature_list.json` — determines verification strategy and applicable standards
+
+### Directory Structure
+
+```
+project-root/
+├── specs/
+│   ├── auth/
+│   │   ├── spec.md
+│   │   └── feature_list.json
+│   └── video-editor/
+│       ├── spec.md
+│       └── feature_list.json
+├── .active-scope
+├── spec.md              # Symlink to active scope
+├── feature_list.json        # Symlink to active scope
+├── progress.txt
+└── init.sh
+```
+
+### Steps
+
+1. **Check current state**
+   ```bash
+   ls -la specs/ 2>/dev/null || echo "No scopes yet"
+   cat .active-scope 2>/dev/null || echo "No active scope"
+   ```
+
+2. **Create new scope** (if needed)
+   ```bash
+   mkdir -p specs/auth
+   # Create specs/auth/spec.md with specification
+   ```
+
+3. **Switch to scope**
+   ```bash
+   echo "auth" > .active-scope
+   ln -sf specs/auth/spec.md spec.md
+   ln -sf specs/auth/feature_list.json feature_list.json
+   ```
+
+4. **Determine project type** — detect or ask:
+   - Look at the codebase: Does it have a `src/` with React/Vue/Svelte? → `web`
+   - Is it a REST/GraphQL API with no frontend? → `api`
+   - Does it have a `main` with CLI arg parsing (cobra, clap, argparse, commander)? → `cli`
+   - Is it a package/module with no main entry point? → `library`
+   - Does it have ETL/pipeline code (pandas, spark, dbt, airflow)? → `data`
+   - Does it have React Native, Flutter, SwiftUI, or Kotlin/Android? → `mobile`
+   - If unclear, default to the most fitting type based on spec.md
+
+5. **Create feature list** — choose the right method:
+
+   **If scope references a constitution / standards document** (e.g., "align with AGENTS.md", "refactor to follow standards"):
+   Use the **Constitution Audit Workflow** from `references/core/constitution-audit.md`. This is a multi-subagent process:
+   - Split the reference document into sections (~200 lines each)
+   - Launch parallel subagents to extract EVERY requirement from each section (read actual text, not summaries)
+   - Launch parallel subagents to verify each requirement against the actual codebase
+   - Generate features ONLY from verified violations
+   - This is NON-NEGOTIABLE for compliance scopes — ad-hoc auditing misses requirements
+
+   **If scope is new feature development** (e.g., "build a PIM system", "add auth"):
+   Use the standard process from `references/core/feature-list-format.md`
+
+   **Important:** Include the `"type"` field in feature_list.json (see feature-list-format.md).
+
+6. **Create/update init.sh** — see `references/core/init-script-template.md`
+
+7. **Commit and update progress log**
+
+---
+
+## Workflow: Continue Session (Autonomous Feature Loop)
+
+This is the main workflow. It runs ALL remaining features to completion without stopping.
+
+**⚠️ CRITICAL NON-STOP RULE (NON-NEGOTIABLE) ⚠️**
+
+**You MUST keep looping until EVERY feature in `feature_list.json` has `"passes": true`. Do NOT stop after one feature. Do NOT stop after two features. Do NOT stop to report progress to the user. Do NOT ask the human what to do next. The human may be asleep.**
+
+**After EACH subagent completes, you MUST immediately launch the NEXT subagent for the next incomplete feature. The ONLY acceptable reasons to stop are:**
+1. **ALL features have `"passes": true`**
+2. **A truly unrecoverable error** (hardware failure, missing credentials that cannot be worked around)
+
+**Stopping to "report back" or "check in" with the user is a VIOLATION of this workflow. The user explicitly chose autonomous execution. KEEP GOING.**
+
+### Session Startup Sequence
+
+1. **Get bearings**
+   ```bash
+   pwd
+   cat progress.txt
+   cat feature_list.json
+   git log --oneline -20
+   ```
+
+2. **Determine project type** — read the `"type"` field from `feature_list.json`
+
+3. **Start environment**
+   ```bash
+   bash init.sh
+   ```
+
+4. **Verify existing features** — Run all unit tests (fast) and only the tests for features completed in previous sessions (not this session's new work). Skip tests for features not yet implemented.
+
+### Autonomous Feature Loop
+
+After startup, enter the **feature loop**. This loop runs until ALL features pass:
+
+```
+features_completed_this_session = 0
+
+WHILE there are features with "passes": false in feature_list.json:
+    1. Read feature_list.json to find next incomplete feature (highest priority first)
+    2. Launch a SUBAGENT to implement, test, verify, and commit
+    3. After subagent completes, VERIFY output quality (see below)
+    4. features_completed_this_session++
+    5. If features_completed_this_session % 5 == 0: run STANDARDS AUDIT (see below)
+    6. CONTINUE to next feature — do NOT stop
+END WHILE
+
+Run FINAL STANDARDS AUDIT before ending session
+```
+
+### Launching Feature Subagents (Claude Code)
+
+For each feature, use the **Agent tool** to launch a subagent. This keeps each feature's work isolated and prevents context window overflow.
+
+**Subagent prompt template:**
+
+```
+You are implementing a feature for a {type} project. Work autonomously — do NOT ask questions, make your best judgment on all decisions.
+
+## Project Context
+- Working directory: {pwd}
+- Active scope: {scope from .active-scope}
+- Project type: {type from feature_list.json}
+
+## Feature to Implement
+- ID: {id}
+- Description: {description}
+- Category: {category}
+- Priority: {priority}
+- Test Steps:
+{steps as bullet list}
+
+## Standards Documents
+Read these reference docs and follow them during implementation:
+- references/core/code-quality.md — Code organization, testability, unit testing rules
+- references/core/gitignore-standards.md — Files that must never be committed
+- references/verification/{type}-verification.md — Verification strategy for this project type
+{IF type == "web" or type == "mobile":}
+- references/web/ux-standards.md — UX quality requirements (loading/empty/error states, responsive, accessibility)
+- references/web/frontend-design.md — Visual design principles (typography, color, composition)
+{END IF}
+
+## Instructions
+
+### Phase 1: Implement
+1. Read the relevant source files to understand the current codebase
+2. Read the spec.md file for full project context
+3. Read the standards documents listed above
+4. Implement the feature following existing code patterns and the standards
+5. Make sure the implementation is complete and production-quality
+
+### Phase 2: Refactor & Unit Test
+Follow references/core/code-quality.md:
+6. Extract pure functions out of components and handlers
+7. Move business logic into testable utility/service modules
+8. Eliminate duplication — reuse existing helpers or extract new shared ones
+9. Write unit tests for all extracted logic. Run them until green.
+
+### Phase 3: Verification
+Follow references/verification/{type}-verification.md:
+10. Execute the verification strategy defined for {type} projects
+11. Run all relevant tests — fix until green
+12. MANDATORY: Perform the verification checks specified in the doc
+    Fix and re-run until all pass.
+
+### Phase 4: Gitignore Review
+Follow references/core/gitignore-standards.md:
+13. Run `git status --short` and check every file against gitignore patterns
+14. Add any missing patterns to `.gitignore`, remove from tracking if needed
+
+### Phase 5: Commit
+15. Update feature_list.json — change "passes": false to "passes": true
+16. Update progress.txt with what was done and current feature pass count
+17. Commit all changes:
+    git add -A && git commit -m "feat: [description] — Implemented feature #[id]: [description]"
+
+## Key Rules
+- Follow existing code patterns and the standards documents
+- Keep changes focused on this feature only
+- Do not break other features
+- Make all decisions yourself, never ask for human input
+- EVERY feature must be verified per the verification strategy — no exceptions
+- BEFORE committing, review ALL files for .gitignore candidates
+```
+
+**How to launch the subagent:**
+
+Use the Agent tool with `subagent_type: "general-purpose"`. Example:
+
+```
+Agent tool call:
+  description: "Implement feature #3"
+  prompt: [filled template above]
+```
+
+### After Each Subagent Completes
+
+The subagent handles implementation, testing, verification, and committing. The parent agent MUST verify:
+
+1. **Confirm commit** — `git log --oneline -1`
+2. **Confirm feature_list.json** — feature has `"passes": true`
+3. **Verify output quality** — type-specific checks:
+
+   **For `web` and `mobile` projects:**
+   - VERIFY SCREENSHOTS EXIST:
+     ```bash
+     ls e2e/screenshots/{scope}-feature-{id}-*.png 2>/dev/null | wc -l
+     ```
+     If count is 0, launch a follow-up subagent to add screenshots and visual review.
+   - SPOT-CHECK one screenshot — Use the Read tool to open one screenshot. Evaluate against verification criteria.
+   - If quality is poor, launch a **polish subagent**.
+
+   **For `api` projects:**
+   - Verify integration tests exist and pass
+   - Check that error cases are tested (not just happy paths)
+
+   **For `cli` projects:**
+   - Run a quick smoke test: `./bin/{tool} --help` or equivalent
+   - Verify error cases are tested
+
+   **For `library` projects:**
+   - Verify all tests pass (including race detection if applicable)
+   - Check public API surface hasn't accidentally expanded
+
+   **For `data` projects:**
+   - Verify transformation tests exist and pass
+   - Check edge cases (empty, null, duplicate) are tested
+
+4. If the subagent failed to complete, launch another subagent to fix and finish.
+5. **Loop back IMMEDIATELY** — pick the next incomplete feature and launch a new subagent RIGHT NOW. Do NOT stop, do NOT report to the user, do NOT wait for instructions. KEEP GOING until ALL features pass.
+
+### Periodic Standards Audit
+
+**When to run:** Every 5 completed features AND at session end (before final commit).
+
+This uses the same audit pattern as `references/core/constitution-audit.md`, but applied to the project's own standards documents. The audit catches issues that individual subagents missed — self-review has blind spots.
+
+**Audit process:**
+
+1. Determine which standards apply based on project type:
+   - **All types:** `core/code-quality.md`, `core/gitignore-standards.md`, `core/session-handoff-standards.md`
+   - **web/mobile only:** `web/ux-standards.md`, `web/frontend-design.md`
+
+2. For EACH applicable standards document, launch a **verification subagent** that:
+   - Reads the standards document
+   - Reads the code/files changed since the last audit (use `git diff --name-only HEAD~5` or similar)
+   - Checks each standard against the actual code
+   - Reports: COMPLIANT or VIOLATION with specific file and line
+
+3. Collect all violations across subagents
+
+4. If violations found:
+   - Group related violations into fix batches
+   - Launch a **fix subagent** for each batch
+   - Each fix subagent commits its changes
+   - Re-verify the fixed code
+
+5. Log audit results in `progress.txt`
+
+**Subagent prompt template for standards audit:**
+
+```
+You are auditing recently changed code against a project standards document.
+
+## Standards Document
+{paste the full content of the standards doc}
+
+## Files to Audit
+{list of files changed since last audit}
+
+## Instructions
+1. Read each file listed above
+2. For EACH standard in the document, check if the code complies
+3. Report findings as:
+   - COMPLIANT: {standard} — {brief evidence}
+   - VIOLATION: {standard} — {file}:{line} — {what's wrong} — {fix needed}
+4. Be thorough — check every standard, don't skip "obvious" ones
+```
+
+### Decision Making Guidelines
+
+Since the human may be asleep, follow these rules for autonomous decisions:
+
+| Situation | Decision |
+|-----------|----------|
+| Ambiguous spec | Choose the simplest reasonable interpretation |
+| Multiple implementation approaches | Pick the one matching existing patterns |
+| Test is flaky | Add proper waits/retries, don't skip the test |
+| Feature seems too large | Break into sub-tasks within the subagent |
+| Dependency conflict | Use the version compatible with existing packages |
+| Build error | Read the error, fix it, rebuild |
+| Port conflict | Kill the conflicting process and restart |
+| Database issue | Reset/reseed the database |
+| Feature blocked by another | Skip to next feature, come back later |
+| Missing dependency | Install it |
+| Unclear file structure | Follow existing project conventions |
+| **Web/mobile:** Unclear UI design | Follow references/web/frontend-design.md |
+| **Web/mobile:** UI looks generic/plain | Add visual polish per references/web/ux-standards.md |
+| **Web/mobile:** Subagent skipped screenshots | Launch follow-up subagent to add them |
+| **API:** Unclear response format | Follow existing endpoint patterns, use consistent error format |
+| **CLI:** Unclear output format | Match existing command output style |
+| **Library:** Unclear public API | Keep it minimal, expose only what's needed |
+
+### Session End
+
+Only end the session when:
+- **ALL features have `"passes": true`**, OR
+- A truly unrecoverable error occurs (hardware failure, missing credentials, etc.)
+
+Before ending:
+1. Run **final standards audit** (see Periodic Standards Audit above) — include `core/session-handoff-standards.md`
+2. Run all unit tests
+3. Run verification tests only for features completed in previous sessions (regression check)
+4. Verify codebase meets `references/core/session-handoff-standards.md`
+5. Commit any remaining changes
+
+---
+
+## Critical Rules
+
+### Standards Enforcement
+- All quality standards live in `references/` docs — subagents MUST read them
+- Standards are verified both during implementation (by subagent) AND periodically (by audit)
+- Audit violations MUST be fixed before session ends
+
+### Autonomous Operation (NON-NEGOTIABLE)
+- NEVER stop to ask the human a question
+- NEVER wait for human approval
+- NEVER stop to "report progress" or "check in" — the user can see commits in git log
+- NEVER output a summary and wait — immediately launch the next subagent
+- After each subagent completes: verify → launch next subagent. That's it. No pausing.
+- Make reasonable decisions based on existing patterns
+- If blocked, try alternative approaches before giving up
+- Keep working until ALL features are complete
+- The continue workflow is a LOOP, not a single step. You are the loop controller.
+
+---
+
+## Reference Files
+
+All standards, templates, and detailed processes:
+
+### Core (all project types)
+- `references/core/code-quality.md` — Code organization, testability, and unit testing standards
+- `references/core/gitignore-standards.md` — Gitignore patterns and review process
+- `references/core/feature-list-format.md` — Feature list structure, critical rules, priority order
+- `references/core/session-handoff-standards.md` — Clean codebase, git state, progress tracking
+- `references/core/init-script-template.md` — init.sh template
+- `references/core/continue-workflow.md` — Full continue workflow details
+- `references/core/constitution-audit.md` — Systematic audit workflow for compliance/alignment scopes
+
+### Web-specific (type: web, mobile)
+- `references/web/ux-standards.md` — UX quality standards and checklist
+- `references/web/frontend-design.md` — Design principles for visual quality
+
+### Verification strategies (one per project type)
+- `references/verification/web-verification.md` — Playwright E2E + screenshots
+- `references/verification/api-verification.md` — Integration tests + endpoint validation
+- `references/verification/cli-verification.md` — Command execution + output validation
+- `references/verification/library-verification.md` — Unit tests + public API validation
+- `references/verification/data-verification.md` — Transformation tests + data quality
+- `references/verification/mobile-verification.md` — Mobile E2E + screenshots
diff --git a/package.json b/package.json
new file mode 100644
index 0000000..c47a537
--- /dev/null
+++ b/package.json
@@ -0,0 +1,28 @@
+{
+  "name": "iterative-dev",
+  "version": "2.0.0",
+  "description": "Iterative development workflow for AI agents with incremental progress, scoped features, and type-aware verification. Supports web, API, CLI, library, data pipeline, and mobile projects.",
+  "keywords": [
+    "windsurf",
+    "claude-code",
+    "skill",
+    "ai-agent",
+    "development",
+    "web-development",
+    "api",
+    "cli",
+    "library",
+    "data-pipeline",
+    "mobile",
+    "e2e-testing",
+    "incremental-development",
+    "subagent",
+    "autonomous"
+  ],
+  "author": "",
+  "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": ""
+  }
+}
diff --git a/references/core/code-quality.md b/references/core/code-quality.md
new file mode 100644
index 0000000..1c46649
--- /dev/null
+++ b/references/core/code-quality.md
@@ -0,0 +1,48 @@
+# Code Quality Standards
+
+Every feature implementation must meet these standards. Code that works but is messy, duplicated, or untestable is NOT complete.
+
+## File Organization
+
+- Keep files under 300 lines — split if larger
+- One component/module per file
+- Group related files in directories (e.g., `services/`, `utils/`, `components/`)
+- Follow existing project conventions for file naming and placement
+
+## Testable Architecture
+
+- **Extract pure functions** out of UI components and handlers
+- Move business logic, validation, data transformation, and state calculations into separate utility/service modules
+- These modules must be unit-testable without DOM, network, or framework dependencies
+- UI components should orchestrate; logic modules should compute
+
+## Unit Testing
+
+- Write unit tests for all extracted logic: pure functions, validators, transformers, state calculations, business rules
+- Use the project's existing test framework
+- Do NOT unit test UI rendering or things better covered by E2E tests
+- Unit tests are for logic; E2E tests are for behavior
+- All unit tests must pass before committing
+
+## No Duplication
+
+- If you see duplicated logic (in your code or existing code you touched), extract shared helpers
+- Don't duplicate what already exists elsewhere in the codebase
+- Check for existing utilities before writing new ones
+- Prefer composition over copy-paste
+
+## Code Style
+
+- Follow existing code patterns and architecture in the project
+- Keep functions small and single-purpose
+- Name things clearly — intent over implementation
+- Prefer composition over deep nesting
+- Use `data-testid` attributes for E2E test selectors
+
+## What NOT to Do
+
+- Don't leave debug code or `console.log` statements
+- Don't leave commented-out code
+- Don't leave TODO comments without associated feature list items
+- Don't introduce new patterns that conflict with existing project conventions
+- Don't over-engineer — solve the current problem, not hypothetical future ones
diff --git a/references/core/constitution-audit.md b/references/core/constitution-audit.md
new file mode 100644
index 0000000..7321349
--- /dev/null
+++ b/references/core/constitution-audit.md
@@ -0,0 +1,131 @@
+# Constitution Audit Workflow
+
+When a scope involves aligning a codebase with a reference document (constitution, style guide, AGENTS.md, coding standards, etc.), the init-scope workflow MUST use this systematic audit process instead of ad-hoc exploration.
+
+## When to Use
+
+Use this workflow when the user's scope description includes phrases like:
+- "align with", "comply with", "follow", "match"
+- "refactor to match AGENTS.md / constitution / standards"
+- "audit against", "check compliance with"
+- Any scope that references an external document as the source of truth
+
+## The Problem This Solves
+
+Ad-hoc auditing (reading the doc once and listing gaps from memory) misses requirements because:
+1. Constitution documents are long (often 1000+ lines) with many specific rules
+2. Rules are scattered across sections — a "Service Architecture" section may have rules about testing
+3. Code examples contain implicit requirements (e.g., a code example showing `api.CreateProductReq` implies services must use generated types)
+4. Some rules are stated as "MUST" / "CRITICAL" / "NON-NEGOTIABLE" but are easy to overlook in a single pass
+5. A single agent can't hold the entire document + entire codebase in context simultaneously
+
+## Systematic Audit Process
+
+### Step 1: Extract Requirements (per-section subagents)
+
+Split the constitution document into logical sections. For EACH section, launch a **dedicated subagent** that:
+
+1. Reads ONLY that section of the constitution thoroughly
+2. Extracts every concrete, testable requirement as a checklist item
+3. For each requirement, identifies:
+   - The exact rule (quote the relevant text)
+   - What file(s) / pattern(s) to check in the codebase
+   - How to verify compliance (what to grep for, what to read, what to run)
+
+**Subagent prompt template for extraction:**
+
+```
+You are extracting requirements from a section of a project constitution document.
+
+## Section to Analyze
+{paste the section text here — NOT a file path, paste the actual content}
+
+## Instructions
+1. Read this section carefully — every sentence may contain a requirement
+2. Extract EVERY concrete, testable requirement. Include:
+   - Requirements stated with MUST, CRITICAL, NON-NEGOTIABLE
+   - Requirements implied by code examples (e.g., if an example shows `cmp.Diff`, that means "tests MUST use cmp.Diff")
+   - Requirements about file locations, naming conventions, patterns
+   - Requirements about what NOT to do (anti-patterns)
+3. For each requirement, output:
+   - rule: The exact requirement (quote or paraphrase)
+   - check: How to verify it in the codebase (file to read, grep pattern, command to run)
+   - section: Which constitution section it comes from
+
+Output as a numbered list. Be exhaustive — it's better to extract too many requirements than to miss one.
+```
+
+### Step 2: Verify Each Requirement Against Codebase
+
+For each extracted requirement, launch verification subagents (can batch related requirements together). Each subagent:
+
+1. Reads the specific files mentioned in the "check" field
+2. Determines: COMPLIANT or VIOLATION
+3. For violations: describes exactly what's wrong and what the fix would be
+
+**Subagent prompt template for verification:**
+
+```
+You are auditing a codebase against specific requirements from a project constitution.
+
+## Requirements to Verify
+{numbered list of requirements with their check instructions}
+
+## Instructions
+For each requirement:
+1. Run the check (read file, grep, etc.)
+2. Determine: COMPLIANT or VIOLATION
+3. If VIOLATION: describe what's wrong and what the fix should be
+
+Output format:
+- Requirement #N: COMPLIANT | VIOLATION
+  - Current: {what the code does now}
+  - Required: {what the constitution requires}
+  - Fix: {description of needed change}
+```
+
+### Step 3: Generate Feature List from Violations
+
+Group related violations into features. Each feature should:
+- Fix ONE specific pattern or concern (not mix unrelated changes)
+- Have concrete, verifiable test steps
+- Include the exact constitution rule being addressed
+- Be ordered: dependencies first (e.g., fix types before fixing code that uses those types)
+
+### Key Principles
+
+1. **Read the actual text, not summaries** — Subagents must receive the actual constitution text, not a summary. Summaries lose details.
+
+2. **One section per extraction pass** — Don't try to extract requirements from the entire document at once. Split into sections of ~200 lines max per subagent.
+
+3. **Code examples are requirements** — If the constitution shows a code example, every aspect of that example is a requirement. If it shows `NewService(db).WithLogger(log).Build()`, then:
+   - Services MUST have builder pattern
+   - Builder MUST accept db as constructor arg
+   - Builder MUST have WithLogger method
+   - Builder MUST have Build method
+   - Build MUST return an interface
+
+4. **Cross-reference sections** — Requirements in one section may affect code covered by another section. The verification step catches this because it checks actual code.
+
+5. **Don't skip "obvious" checks** — Even if something seems likely to be compliant, verify it. The whole point is that "obvious" assumptions cause missed requirements.
+
+## Example: Auditing Against AGENTS.md
+
+For a document like AGENTS.md with sections on Testing, Architecture, Error Handling, etc.:
+
+**Extraction subagents:**
+- Agent 1: Extract requirements from "Testing Principles" section
+- Agent 2: Extract requirements from "Service Architecture" section
+- Agent 3: Extract requirements from "Error Handling" section
+- Agent 4: Extract requirements from "OpenAPI/ogen Workflow" section
+- Agent 5: Extract requirements from "Frontend Constitution" section
+- Agent 6: Extract requirements from "Development Workflow" section
+
+**Verification subagents** (can run in parallel):
+- Agent A: Verify testing requirements against backend/tests/
+- Agent B: Verify architecture requirements against backend/services/, handlers/
+- Agent C: Verify error handling against backend/handlers/error_*.go
+- Agent D: Verify OpenAPI requirements against api/openapi/ and generated code
+- Agent E: Verify frontend requirements against frontend/src/ and frontend/tests/
+
+**Result:** A comprehensive feature list with zero missed requirements.
diff --git a/references/core/continue-workflow.md b/references/core/continue-workflow.md
new file mode 100644
index 0000000..384662a
--- /dev/null
+++ b/references/core/continue-workflow.md
@@ -0,0 +1,161 @@
+# Continue Workflow — Full Details
+
+This is the primary workflow for every session after initialization. It runs **autonomously until ALL features are complete**.
+
+**CRITICAL: Do NOT stop after implementing one feature. Keep looping until every feature in `feature_list.json` has `"passes": true`. The human may be asleep — make all decisions yourself.**
+
+## Session Startup Sequence
+
+Every coding session should start by:
+
+1. `pwd` — Confirm working directory
+2. Read `progress.txt` — Understand what previous sessions did
+3. Read `feature_list.json` — See current feature status and project type
+4. `git log --oneline -20` — See recent commits
+5. Run `bash init.sh` — Start the dev environment
+6. Quick verification — Make sure existing features work
+
+## Step-by-Step Process
+
+### Step 1: Get Your Bearings
+
+```bash
+pwd
+cat progress.txt
+cat feature_list.json
+git log --oneline -20
+```
+
+Note the `"type"` field in feature_list.json — this determines which verification strategy and standards apply.
+
+### Step 2: Start the Development Environment
+
+```bash
+bash init.sh
+```
+
+If `init.sh` doesn't exist or fails, check the project README or build files for how to start. Fix `init.sh` if needed.
+
+**Ensure all required services are running** (varies by project type):
+
+- **web**: Frontend dev server, backend server, database
+- **api**: API server, database, any external service mocks
+- **cli**: Build the tool binary
+- **library**: No services needed — just ensure build works
+- **data**: Database, data stores, pipeline dependencies
+- **mobile**: Emulator/simulator, backend server
+
+### Step 3: Verify Existing Features (Regression Check)
+
+Before implementing anything new, **verify that existing passing features still work**. To save time, only run what's needed:
+
+1. **Run all unit tests** (fast):
+   ```bash
+   # Use the project's test command
+   npm test       # Node.js
+   go test ./...  # Go
+   pytest tests/  # Python
+   cargo test     # Rust
+   ```
+
+2. **Run verification tests only for features already passing** from previous sessions. Do NOT run tests for features that haven't been implemented yet.
+
+3. If anything is broken, **fix it first**
+
+### Step 4: Enter the Autonomous Feature Loop
+
+**This is the core loop. Do NOT exit until all features pass.**
+
+```
+WHILE there are features with "passes": false in feature_list.json:
+    1. Read feature_list.json
+    2. Find the highest-priority feature with "passes": false
+    3. Launch a SUBAGENT to implement, test, verify, and commit
+    4. After subagent completes: VERIFY output quality (Step 4c)
+    5. If quality fails: launch fix/polish subagent
+    6. LOOP BACK to step 1
+END WHILE
+```
+
+#### 4a: Pick the Next Feature
+
+From `feature_list.json`, find the **highest-priority feature** that has `"passes": false`.
+
+- Work on features in order of priority (high -> medium -> low)
+- Within the same priority, work in the order they appear in the file
+- If a feature is blocked, skip it and come back later
+
+#### 4b: Launch a Subagent for the Feature
+
+Use the **Agent tool** (Claude Code) to launch a subagent for each feature. The subagent handles the **full lifecycle**: implement, test, verify, and commit. This isolates each feature's work and prevents context window overflow.
+
+Use the subagent prompt template from SKILL.md. The template adapts based on project type — it includes the correct verification strategy and only includes web-specific standards for web/mobile projects.
+
+#### 4c: Verify Subagent Output (MANDATORY)
+
+After the subagent completes, the parent agent MUST verify:
+
+1. **Confirm commit** — `git log --oneline -1`
+2. **Confirm feature_list.json** — feature has `"passes": true`
+3. **Type-specific verification** — see SKILL.md "After Each Subagent Completes" section
+4. If the subagent failed to complete, launch another subagent to fix and finish.
+5. **Loop back** — pick the next incomplete feature and repeat.
+
+**Do NOT stop. Keep looping until all features pass.**
+
+### Step 8: Final Verification (When ALL Features Pass)
+
+Only when every feature has `"passes": true`:
+
+1. **Run all unit tests**
+2. **Run verification tests for features completed in previous sessions** (regression check)
+3. **Verify clean git status**
+   ```bash
+   git status
+   ```
+4. **Update progress.txt** with final session summary:
+   ```
+   ## Session Complete — [DATE]
+   ### Summary:
+   - All [N] features implemented and passing
+   - Unit tests and regression tests green
+   - All features verified per {type} verification strategy
+   - Codebase clean and production-ready
+   ```
+5. **Final commit** if needed
+
+## Decision Making (Autonomous Mode)
+
+Since the human may be asleep, follow these rules:
+
+| Situation | Decision |
+|-----------|----------|
+| Ambiguous spec | Choose the simplest reasonable interpretation |
+| Multiple approaches | Pick the one matching existing patterns |
+| Flaky test | Add proper waits/retries, don't skip |
+| Feature too large | Break into sub-tasks within the subagent |
+| Dependency conflict | Use version compatible with existing packages |
+| Build error | Read error, fix it, rebuild |
+| Port conflict | Kill conflicting process, restart |
+| Database issue | Reset/reseed the database |
+| Feature blocked | Skip to next, come back later |
+| Missing dependency | Install it |
+| Unclear file structure | Follow existing project conventions |
+| **Web/mobile:** Unclear UI design | Follow references/web/frontend-design.md |
+| **Web/mobile:** UI looks generic | Add visual polish per references/web/ux-standards.md |
+| **API:** Unclear response format | Follow existing endpoint patterns |
+| **CLI:** Unclear output format | Match existing command output style |
+| **Library:** Unclear public API | Keep it minimal |
+
+## What NOT To Do
+
+- Don't stop after one feature — keep going until ALL pass
+- Don't ask the human what to do — decide yourself
+- Don't try to one-shot the entire app
+- Don't declare the project "done" prematurely — check feature_list.json
+- Don't leave the codebase in a broken state
+- Don't skip testing — verify features per the project's verification strategy
+- Don't modify feature descriptions or test steps in feature_list.json
+- Don't implement features out of priority order without good reason
+- Don't wait for human approval between features
+- Don't skip verification — it is MANDATORY for every feature
diff --git a/references/core/feature-list-format.md b/references/core/feature-list-format.md
new file mode 100644
index 0000000..ecfa422
--- /dev/null
+++ b/references/core/feature-list-format.md
@@ -0,0 +1,214 @@
+# Feature List Format
+
+The `feature_list.json` file is the single source of truth for project progress.
+
+## Structure
+
+```json
+{
+  "type": "web",
+  "features": [
+    {
+      "id": 1,
+      "category": "functional",
+      "priority": "high",
+      "description": "Brief description of the feature",
+      "steps": [
+        "Step 1: Perform action",
+        "Step 2: Verify expected result"
+      ],
+      "passes": false
+    }
+  ]
+}
+```
+
+## Top-Level Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `type` | string | Project type: `"web"`, `"api"`, `"cli"`, `"library"`, `"data"`, or `"mobile"`. Determines verification strategy and applicable standards. |
+| `features` | array | Array of feature objects |
+
+## Feature Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | number | Unique numeric identifier within scope |
+| `category` | string | Feature category (see below) |
+| `priority` | string | "high", "medium", or "low" |
+| `description` | string | Brief description of the feature |
+| `steps` | array | Test steps to verify the feature |
+| `passes` | boolean | Whether the feature passes all tests |
+
+## Categories
+
+Categories depend on project type:
+
+| Type | Common Categories |
+|------|------------------|
+| **web** | `"functional"`, `"style"`, `"accessibility"` |
+| **api** | `"functional"`, `"validation"`, `"security"` |
+| **cli** | `"functional"`, `"usability"`, `"error-handling"` |
+| **library** | `"functional"`, `"api-design"`, `"performance"` |
+| **data** | `"functional"`, `"data-quality"`, `"performance"` |
+| **mobile** | `"functional"`, `"style"`, `"accessibility"` |
+
+You may use any category that makes sense for the project.
+
+## Requirements
+
+- Cover every feature in the scope's spec
+- ALL features start with `"passes": false`
+- Each feature has a unique numeric `id` (unique within scope)
+- `type` field MUST be present at the top level
+
+## Critical Rules
+
+**NEVER:**
+- Remove or edit feature descriptions
+- Remove or edit test steps
+- Weaken or delete tests
+- Change a passing feature back to failing (unless genuine regression)
+
+**ONLY:**
+- Change `"passes": false` to `"passes": true` after thorough verification
+
+## Priority Order
+
+Work on features in this order:
+1. **high** priority first
+2. **medium** priority second
+3. **low** priority last
+4. Within same priority, work in order they appear in the file
+
+## Best Practices for Test Steps
+
+### Write Verifiable Steps
+
+Every feature's test steps should be concrete and verifiable. The steps depend on project type:
+
+**Web projects:**
+- "Step N: Verify loading skeleton appears while data loads"
+- "Step N: Verify empty state shows icon, message, and CTA when no items exist"
+- "Step N: Verify the page renders correctly at mobile width (375px)"
+
+**API projects:**
+- "Step N: POST /api/products with valid body returns 201 and product object"
+- "Step N: POST /api/products with missing name returns 400 with field error"
+- "Step N: GET /api/products without auth returns 401"
+
+**CLI projects:**
+- "Step N: Run `mytool list --format json` and verify JSON output"
+- "Step N: Run `mytool` with no args and verify help text is shown"
+- "Step N: Run `mytool process --input missing.txt` and verify error message"
+
+**Library projects:**
+- "Step N: Call parse('valid input') and verify correct result"
+- "Step N: Call parse('') and verify it returns descriptive error"
+- "Step N: Verify Parse is exported from the public API"
+
+**Data projects:**
+- "Step N: Run pipeline with sample input and verify output schema"
+- "Step N: Run pipeline with empty input and verify empty output (not error)"
+- "Step N: Verify aggregation totals match expected values"
+
+**Mobile projects:**
+- "Step N: Tap login button and verify navigation to dashboard"
+- "Step N: Verify loading indicator during API call"
+- "Step N: Verify layout on small screen (iPhone SE)"
+
+## Examples
+
+### Web Project
+```json
+{
+  "type": "web",
+  "features": [
+    {
+      "id": 1,
+      "category": "functional",
+      "priority": "high",
+      "description": "User can register with email and password",
+      "steps": [
+        "Step 1: Navigate to /register",
+        "Step 2: Verify registration form loads with proper layout",
+        "Step 3: Submit empty form and verify inline validation errors",
+        "Step 4: Fill in email and password fields",
+        "Step 5: Click Register and verify loading state on button",
+        "Step 6: Verify redirect to dashboard"
+      ],
+      "passes": false
+    }
+  ]
+}
+```
+
+### API Project
+```json
+{
+  "type": "api",
+  "features": [
+    {
+      "id": 1,
+      "category": "functional",
+      "priority": "high",
+      "description": "Create product endpoint",
+      "steps": [
+        "Step 1: POST /api/products with valid body returns 201",
+        "Step 2: Response contains id, name, price, created_at",
+        "Step 3: POST with missing required field returns 400 with field error",
+        "Step 4: POST with invalid price returns 400 with validation error",
+        "Step 5: GET /api/products/{id} returns the created product"
+      ],
+      "passes": false
+    }
+  ]
+}
+```
+
+### CLI Project
+```json
+{
+  "type": "cli",
+  "features": [
+    {
+      "id": 1,
+      "category": "functional",
+      "priority": "high",
+      "description": "Init command creates project structure",
+      "steps": [
+        "Step 1: Run `mytool init myproject` in empty directory",
+        "Step 2: Verify directory structure created (src/, tests/, config/)",
+        "Step 3: Verify config file has correct defaults",
+        "Step 4: Run `mytool init` without name and verify error message",
+        "Step 5: Run `mytool init myproject` again and verify idempotent behavior"
+      ],
+      "passes": false
+    }
+  ]
+}
+```
+
+### Library Project
+```json
+{
+  "type": "library",
+  "features": [
+    {
+      "id": 1,
+      "category": "functional",
+      "priority": "high",
+      "description": "Parse function handles all input formats",
+      "steps": [
+        "Step 1: parse('simple string') returns correct AST node",
+        "Step 2: parse('nested {value}') handles interpolation",
+        "Step 3: parse('') returns descriptive error",
+        "Step 4: parse(null) returns descriptive error without panic",
+        "Step 5: Verify Parse is exported in public API"
+      ],
+      "passes": false
+    }
+  ]
+}
+```
diff --git a/references/core/gitignore-standards.md b/references/core/gitignore-standards.md
new file mode 100644
index 0000000..526637d
--- /dev/null
+++ b/references/core/gitignore-standards.md
@@ -0,0 +1,58 @@
+# Gitignore Standards
+
+Before every commit, review ALL files that would be staged. Never commit files that should be gitignored.
+
+## Review Process
+
+1. Run `git status --short` to see all untracked and modified files
+2. Check each file against the patterns below
+3. If any file should be ignored:
+   a. Add the pattern to `.gitignore`
+   b. If already tracked, remove from tracking: `git rm --cached <file>`
+   c. Verify with `git status` that the file is now ignored
+
+## Patterns That MUST Be Gitignored
+
+### Build Artifacts
+- `dist/`, `build/`, `.next/`, `out/`, `.output/`
+- `*.tsbuildinfo`
+
+### Dependencies
+- `node_modules/`, `vendor/`, `.pnp.*`
+
+### Environment & Secrets
+- `.env`, `.env.local`, `.env.*.local`
+- `*.pem`, `*.key`, `*.cert`
+- `credentials.json`, `service-account.json`
+
+### IDE & Editor
+- `.idea/`, `.vscode/`, `*.swp`, `*.swo`
+- `.project`, `.classpath`, `.settings/`
+
+### OS Files
+- `.DS_Store`, `Thumbs.db`, `desktop.ini`
+
+### Test Artifacts
+- `test-results/`, `playwright-report/`, `coverage/`, `.nyc_output/`
+
+### Logs
+- `*.log`, `npm-debug.log*`, `yarn-debug.log*`, `pnpm-debug.log*`
+
+### Cache
+- `.cache/`, `.parcel-cache/`, `.turbo/`, `.eslintcache`
+- `.sass-cache/`
+
+### Database Files
+- `*.sqlite`, `*.db`
+
+### Generated Files
+- `*.map` (source maps in production)
+
+## When in Doubt
+
+If a file is:
+- Generated by a build tool → gitignore it
+- Specific to your local environment → gitignore it
+- Contains secrets or credentials → gitignore it
+- Large binary that changes frequently → gitignore it
+- Reproducible from source → probably gitignore it
diff --git a/references/core/init-script-template.md b/references/core/init-script-template.md
new file mode 100644
index 0000000..359d7d9
--- /dev/null
+++ b/references/core/init-script-template.md
@@ -0,0 +1,247 @@
+# init.sh Template
+
+The `init.sh` script sets up the development environment. It must be idempotent (safe to run multiple times).
+
+## Requirements
+
+1. **Kill existing processes** — Clean slate
+2. **Clean old test artifacts** — Fresh test results
+3. **Install/build dependencies** — Ensure latest code
+4. **Start required services** — Servers, databases, etc.
+5. **Be idempotent** — Safe to run multiple times
+
+## Templates by Project Type
+
+### Web Project (Frontend + Backend)
+
+```bash
+#!/bin/bash
+set -e
+
+echo "=== Web Development Environment ==="
+
+# 1. Kill existing servers
+echo "Stopping existing servers..."
+pkill -f 'go run' 2>/dev/null || true
+pkill -f 'vite' 2>/dev/null || true
+pkill -f 'node.*dev' 2>/dev/null || true
+sleep 1
+
+# 2. Delete old screenshots for fresh test results
+echo "Cleaning old test artifacts..."
+rm -rf e2e/screenshots/*.png 2>/dev/null || true
+rm -rf test-results 2>/dev/null || true
+mkdir -p e2e/screenshots
+
+# 3. Install/update dependencies
+echo "Installing dependencies..."
+cd frontend && npm install && cd ..
+cd backend && go mod download && cd ..
+
+# 4. Build backend
+echo "Building backend..."
+cd backend && go build -o backend . && cd ..
+
+# 5. Start database
+echo "Ensuring database is running..."
+brew services start postgresql@18 2>/dev/null || true
+
+# 6. Start backend
+echo "Starting backend on port 8082..."
+cd backend && ./backend &
+cd ..
+
+# 7. Start frontend
+echo "Starting frontend on port 3000..."
+cd frontend && npm run dev &
+cd ..
+
+# 8. Wait and verify
+sleep 3
+echo ""
+echo "Active scope: $(cat .active-scope 2>/dev/null || echo 'none')"
+```
+
+### API Project
+
+```bash
+#!/bin/bash
+set -e
+
+echo "=== API Development Environment ==="
+
+# 1. Kill existing servers
+pkill -f 'go run' 2>/dev/null || true
+pkill -f 'node.*server' 2>/dev/null || true
+pkill -f 'uvicorn\|gunicorn' 2>/dev/null || true
+sleep 1
+
+# 2. Clean test artifacts
+rm -rf test-results 2>/dev/null || true
+
+# 3. Install dependencies
+go mod download          # Go
+# npm install            # Node.js
+# pip install -r requirements.txt  # Python
+
+# 4. Start database
+docker-compose up -d db 2>/dev/null || true
+sleep 2
+
+# 5. Run migrations
+go run ./cmd/migrate up  # or equivalent
+
+# 6. Start API server
+go run ./cmd/server &
+# npm start &            # Node.js
+# uvicorn app:app &      # Python
+
+sleep 2
+echo ""
+echo "Active scope: $(cat .active-scope 2>/dev/null || echo 'none')"
+```
+
+### CLI Project
+
+```bash
+#!/bin/bash
+set -e
+
+echo "=== CLI Development Environment ==="
+
+# 1. Clean old artifacts
+rm -rf bin/ 2>/dev/null || true
+rm -rf test-results 2>/dev/null || true
+
+# 2. Install dependencies
+go mod download          # Go
+# cargo build            # Rust
+# npm install            # Node.js
+# pip install -e .       # Python
+
+# 3. Build the CLI tool
+mkdir -p bin
+go build -o bin/mytool ./cmd/mytool  # Go
+# cargo build && cp target/debug/mytool bin/  # Rust
+# npm run build          # Node.js
+
+# 4. Verify build
+./bin/mytool --version || echo "Build may have failed"
+
+echo ""
+echo "Active scope: $(cat .active-scope 2>/dev/null || echo 'none')"
+```
+
+### Library Project
+
+```bash
+#!/bin/bash
+set -e
+
+echo "=== Library Development Environment ==="
+
+# 1. Clean old artifacts
+rm -rf dist/ build/ 2>/dev/null || true
+rm -rf test-results coverage/ 2>/dev/null || true
+
+# 2. Install dependencies
+go mod download          # Go
+# cargo build            # Rust
+# npm install            # Node.js
+# pip install -e ".[dev]"  # Python
+
+# 3. Verify build
+go build ./...           # Go
+# cargo check            # Rust
+# npm run build          # Node.js
+# python -m py_compile src/*.py  # Python
+
+echo ""
+echo "Active scope: $(cat .active-scope 2>/dev/null || echo 'none')"
+```
+
+### Data Pipeline Project
+
+```bash
+#!/bin/bash
+set -e
+
+echo "=== Data Pipeline Development Environment ==="
+
+# 1. Kill existing processes
+pkill -f 'spark\|airflow' 2>/dev/null || true
+
+# 2. Clean old artifacts
+rm -rf output/ test-results/ 2>/dev/null || true
+
+# 3. Install dependencies
+pip install -r requirements.txt
+# pip install -e ".[dev]"
+
+# 4. Start data services
+docker-compose up -d     # Database, message queue, etc.
+sleep 3
+
+# 5. Prepare test data
+python scripts/seed_test_data.py 2>/dev/null || true
+
+echo ""
+echo "Active scope: $(cat .active-scope 2>/dev/null || echo 'none')"
+```
+
+### Mobile Project
+
+```bash
+#!/bin/bash
+set -e
+
+echo "=== Mobile Development Environment ==="
+
+# 1. Kill existing processes
+pkill -f 'metro\|react-native' 2>/dev/null || true
+
+# 2. Clean old artifacts
+rm -rf test-results/ screenshots/ 2>/dev/null || true
+mkdir -p screenshots
+
+# 3. Install dependencies
+npm install
+# cd ios && pod install && cd ..  # iOS
+
+# 4. Start backend (if needed)
+cd backend && npm start &
+cd ..
+
+# 5. Start Metro bundler (React Native)
+npx react-native start &
+# flutter pub get         # Flutter
+
+sleep 3
+echo ""
+echo "Active scope: $(cat .active-scope 2>/dev/null || echo 'none')"
+```
+
+## Customization
+
+Adapt whichever template best matches your project. The key is:
+1. Idempotent — safe to run repeatedly
+2. Clean artifacts — fresh test results each time
+3. All services started — everything needed to develop and test
+4. Active scope displayed — quick confirmation of current work
+
+## Verification Commands
+
+After running init.sh, verify services:
+
+```bash
+# Check what's running
+lsof -i :3000  # Frontend
+lsof -i :8080  # API server
+lsof -i :5432  # PostgreSQL
+
+# Test endpoints
+curl -s http://localhost:8080/health || echo "Server not responding"
+
+# Verify tool builds
+./bin/mytool --version 2>/dev/null || echo "CLI not built"
+```
diff --git a/references/core/session-handoff-standards.md b/references/core/session-handoff-standards.md
new file mode 100644
index 0000000..cef1cd1
--- /dev/null
+++ b/references/core/session-handoff-standards.md
@@ -0,0 +1,79 @@
+# Session Handoff Standards
+
+Before ending any session, the codebase must meet these standards. These are auditable — a verification subagent can check every item.
+
+## Clean Codebase
+
+### No Debug Code
+- No debug print/log statements left in source code (test files excluded)
+  - JavaScript/TypeScript: no `console.log`, `console.debug`
+  - Python: no `print()` used for debugging, no `pdb.set_trace()`
+  - Go: no `fmt.Println` used for debugging, no `log.Println` debug output
+  - Rust: no `println!` or `dbg!` used for debugging
+- No `debugger` statements (JavaScript/TypeScript)
+- No commented-out code blocks (small inline comments explaining "why" are fine)
+- No `TODO` or `FIXME` comments without a corresponding feature list item
+
+### No Temporary Files
+- No `.tmp`, `.bak`, or `.orig` files
+- No editor swap files (`.swp`, `.swo`)
+- No test output files left in source directories
+
+## Git State
+
+### Clean Working Tree
+- `git status` shows clean working tree (no untracked, modified, or staged files)
+- All work committed with descriptive commit messages
+- No merge conflict markers (`<<<<<<<`, `=======`, `>>>>>>>`) in any file
+
+### Gitignore Compliance
+- All patterns from `references/core/gitignore-standards.md` present in `.gitignore`
+- No build artifacts, dependencies, secrets, or generated files tracked
+
+## Progress Tracking
+
+### progress.txt Updated
+- Contains summary of what was done this session
+- Lists features completed (IDs and descriptions)
+- Shows current pass count (e.g., "12/20 features passing")
+- Notes any issues encountered or features skipped
+
+### feature_list.json Accurate
+- Every completed feature has `"passes": true`
+- No feature marked passing that wasn't actually verified
+- No features removed or descriptions edited
+
+## Verification Commands
+
+An audit subagent can verify these standards with:
+
+```bash
+# Clean working tree
+git status --porcelain | wc -l  # Should be 0
+
+# No debug statements (adapt patterns for your language)
+# JavaScript/TypeScript:
+grep -r "console\.\(log\|debug\)" --include="*.ts" --include="*.tsx" --include="*.js" --include="*.jsx" --exclude-dir=node_modules --exclude-dir=test --exclude-dir=e2e --exclude-dir=__tests__ -l
+
+# Go:
+grep -rn "fmt\.Print" --include="*.go" --exclude-dir=vendor --exclude="_test.go" -l
+
+# Python:
+grep -rn "print(" --include="*.py" --exclude-dir=__pycache__ --exclude-dir=tests --exclude="*_test.py" -l
+
+# Rust:
+grep -rn "println!\|dbg!" --include="*.rs" --exclude-dir=target -l
+
+# No debugger statements
+grep -r "debugger" --include="*.ts" --include="*.tsx" --include="*.js" --include="*.jsx" --exclude-dir=node_modules -l
+
+# No merge conflict markers
+grep -r "^<<<<<<< \|^=======$\|^>>>>>>> " --include="*.ts" --include="*.tsx" --include="*.js" --include="*.jsx" --include="*.go" --include="*.py" --include="*.rs" -l
+
+# No TODO without feature list item
+grep -rn "TODO\|FIXME" --exclude-dir=node_modules --exclude-dir=vendor --exclude-dir=target --exclude-dir=__pycache__ -l
+
+# progress.txt exists and was recently updated
+ls -la progress.txt
+tail -20 progress.txt
+```
diff --git a/references/verification/api-verification.md b/references/verification/api-verification.md
new file mode 100644
index 0000000..b750f32
--- /dev/null
+++ b/references/verification/api-verification.md
@@ -0,0 +1,145 @@
+# API Verification Strategy
+
+Verify API features through integration tests, endpoint validation, and contract testing.
+
+**This is the verification strategy for project type: `api`**
+
+## Overview
+
+API projects are verified through:
+1. **Integration tests** — Hit real endpoints with real requests
+2. **Response validation** — Check status codes, response shapes, error formats
+3. **Contract compliance** — Responses match OpenAPI/schema definitions
+4. **Edge case coverage** — Invalid input, auth failures, not-found, rate limits
+
+## Process
+
+### Step 1: Ensure Environment is Running
+
+```bash
+# Check API server is running (adjust port for your project)
+lsof -i :8080 | head -2
+curl -s http://localhost:8080/health || echo "API not responding"
+```
+
+If not running, start with `bash init.sh`.
+
+### Step 2: Write Integration Tests
+
+Every feature MUST have integration tests covering:
+
+**Happy path:**
+```
+- Valid request → correct status code + response body
+- All required fields present in response
+- Correct content-type header
+```
+
+**Error cases:**
+```
+- Missing required fields → 400 with descriptive error
+- Invalid field values → 400 with field-specific errors
+- Unauthorized → 401 with error message
+- Forbidden → 403 with error message
+- Not found → 404 with error message
+- Conflict/duplicate → 409 with error message
+```
+
+**Edge cases:**
+```
+- Empty collections → 200 with empty array (not null)
+- Pagination boundaries → correct page/total counts
+- Large payloads → handled gracefully
+- Concurrent requests → no race conditions
+```
+
+### Example Test Patterns
+
+#### Go (net/http/httptest)
+```go
+func TestCreateProduct(t *testing.T) {
+    srv := setupTestServer(t)
+
+    resp, err := srv.Client().Post(srv.URL+"/api/products",
+        "application/json",
+        strings.NewReader(`{"name": "Test", "price": 9.99}`))
+    require.NoError(t, err)
+    require.Equal(t, http.StatusCreated, resp.StatusCode)
+
+    var product Product
+    json.NewDecoder(resp.Body).Decode(&product)
+    assert.Equal(t, "Test", product.Name)
+    assert.NotEmpty(t, product.ID)
+}
+```
+
+#### Python (pytest + requests)
+```python
+def test_create_product(api_client):
+    resp = api_client.post("/api/products", json={"name": "Test", "price": 9.99})
+    assert resp.status_code == 201
+    data = resp.json()
+    assert data["name"] == "Test"
+    assert "id" in data
+```
+
+#### Node.js (vitest + supertest)
+```typescript
+test('POST /api/products creates a product', async () => {
+  const res = await request(app)
+    .post('/api/products')
+    .send({ name: 'Test', price: 9.99 })
+    .expect(201);
+
+  expect(res.body.name).toBe('Test');
+  expect(res.body.id).toBeDefined();
+});
+```
+
+### Step 3: Run Tests
+
+```bash
+# Use the project's test command
+go test ./...                    # Go
+pytest tests/                    # Python
+npm test                         # Node.js
+```
+
+### Step 4: Verify Test Quality
+
+After tests pass, verify they are thorough:
+
+1. **Coverage check** — Are all endpoints tested?
+2. **Error paths tested** — Not just happy paths?
+3. **Response shape validated** — Not just status codes?
+4. **Auth tested** — Protected endpoints reject unauthorized requests?
+5. **Idempotency** — Can tests run multiple times without side effects?
+
+### Step 5: Document Results
+
+Record in the subagent's output:
+- Endpoints tested (method + path)
+- Status codes verified
+- Error scenarios covered
+- Any issues found and fixed
+
+## Verification Checklist
+
+For each API feature, verify:
+
+- [ ] All endpoints return correct status codes
+- [ ] Response bodies match expected schema
+- [ ] Error responses have consistent format (e.g., `{"error": "message", "details": [...]}`)
+- [ ] Authentication/authorization enforced on protected endpoints
+- [ ] Input validation rejects malformed data with helpful errors
+- [ ] Pagination works correctly (page, limit, total, next/prev)
+- [ ] Filters and search return correct subsets
+- [ ] CRUD operations are complete (create, read, update, delete all work)
+- [ ] Concurrent access doesn't cause data corruption
+
+## Parent Agent Post-Verification
+
+After subagent completes, parent MUST:
+1. Confirm all tests pass: check test output or run a quick smoke test
+2. Verify error handling is tested (not just happy paths)
+3. If coverage seems thin, launch a follow-up subagent to add missing test cases
diff --git a/references/verification/cli-verification.md b/references/verification/cli-verification.md
new file mode 100644
index 0000000..92a9eac
--- /dev/null
+++ b/references/verification/cli-verification.md
@@ -0,0 +1,163 @@
+# CLI Verification Strategy
+
+Verify CLI tool features through command execution tests, output validation, and exit code checks.
+
+**This is the verification strategy for project type: `cli`**
+
+## Overview
+
+CLI projects are verified through:
+1. **Command execution tests** — Run commands with various arguments and flags
+2. **Output validation** — Check stdout, stderr, and file output
+3. **Exit code checks** — Correct exit codes for success and failure
+4. **Edge case coverage** — Missing args, invalid input, permissions, large files
+
+## Process
+
+### Step 1: Ensure Tool is Built
+
+```bash
+# Build the CLI tool (adjust for your project)
+go build -o ./bin/mytool .       # Go
+cargo build                       # Rust
+npm run build                     # Node.js
+pip install -e .                  # Python
+```
+
+### Step 2: Write Integration Tests
+
+Every feature MUST have tests covering:
+
+**Happy path:**
+```
+- Valid args → correct output + exit code 0
+- All flags/options work as documented
+- Output format is correct (text, JSON, table, etc.)
+```
+
+**Error cases:**
+```
+- Missing required args → helpful error message + exit code 1
+- Invalid flag values → descriptive error + exit code 1
+- File not found → clear error + exit code 1
+- Permission denied → clear error + exit code 1
+- Invalid input format → parse error with line/position info
+```
+
+**Edge cases:**
+```
+- Empty input → graceful handling (not crash)
+- Very large input → handles without OOM or hang
+- Stdin pipe → works with piped input
+- No TTY → works in non-interactive mode
+- Ctrl+C → clean shutdown
+```
+
+### Example Test Patterns
+
+#### Go (exec.Command)
+```go
+func TestListCommand(t *testing.T) {
+    cmd := exec.Command("./bin/mytool", "list", "--format", "json")
+    out, err := cmd.CombinedOutput()
+    require.NoError(t, err, "command failed: %s", string(out))
+
+    var items []Item
+    require.NoError(t, json.Unmarshal(out, &items))
+    assert.NotEmpty(t, items)
+}
+
+func TestInvalidFlag(t *testing.T) {
+    cmd := exec.Command("./bin/mytool", "--invalid-flag")
+    out, err := cmd.CombinedOutput()
+    assert.Error(t, err)
+    assert.Contains(t, string(out), "unknown flag")
+}
+```
+
+#### Rust (assert_cmd)
+```rust
+use assert_cmd::Command;
+
+#[test]
+fn test_list_command() {
+    Command::cargo_bin("mytool")
+        .unwrap()
+        .arg("list")
+        .arg("--format")
+        .arg("json")
+        .assert()
+        .success()
+        .stdout(predicates::str::contains("["));
+}
+```
+
+#### Python (subprocess)
+```python
+import subprocess
+
+def test_list_command():
+    result = subprocess.run(
+        ["python", "-m", "mytool", "list", "--format", "json"],
+        capture_output=True, text=True
+    )
+    assert result.returncode == 0
+    data = json.loads(result.stdout)
+    assert isinstance(data, list)
+```
+
+#### Node.js (execa)
+```typescript
+import { execa } from 'execa';
+
+test('list command outputs JSON', async () => {
+  const { stdout, exitCode } = await execa('./bin/mytool', ['list', '--format', 'json']);
+  expect(exitCode).toBe(0);
+  const data = JSON.parse(stdout);
+  expect(Array.isArray(data)).toBe(true);
+});
+```
+
+### Step 3: Run Tests
+
+```bash
+# Use the project's test command
+go test ./...
+cargo test
+npm test
+pytest tests/
+```
+
+### Step 4: Verify Test Quality
+
+After tests pass, verify:
+
+1. **All subcommands tested** — Every command/subcommand has at least one test
+2. **All flags tested** — Each flag is exercised in at least one test
+3. **Help text correct** — `--help` output matches actual behavior
+4. **Error messages helpful** — Errors tell the user what to do, not just what went wrong
+5. **Exit codes consistent** — 0 for success, 1 for user error, 2 for system error
+
+## Verification Checklist
+
+For each CLI feature, verify:
+
+- [ ] Command produces correct output for valid input
+- [ ] Exit code is 0 on success
+- [ ] Exit code is non-zero on failure
+- [ ] Error messages go to stderr (not stdout)
+- [ ] Error messages are actionable (tell user how to fix)
+- [ ] `--help` flag works and is accurate
+- [ ] Flags have short and long forms where appropriate
+- [ ] Output formats work (text, json, table, csv if supported)
+- [ ] Piped input works (`echo "data" | mytool process`)
+- [ ] File arguments handle missing/unreadable files gracefully
+- [ ] Quiet/verbose modes work if supported
+
+## Parent Agent Post-Verification
+
+After subagent completes, parent MUST:
+1. Confirm all tests pass
+2. Run a quick smoke test: `./bin/mytool --help` or equivalent
+3. Verify error cases are tested (not just happy paths)
+4. If test coverage seems thin, launch a follow-up subagent
diff --git a/references/verification/data-verification.md b/references/verification/data-verification.md
new file mode 100644
index 0000000..298bce5
--- /dev/null
+++ b/references/verification/data-verification.md
@@ -0,0 +1,177 @@
+# Data Pipeline Verification Strategy
+
+Verify data pipeline features through input/output validation, transformation tests, and data quality checks.
+
+**This is the verification strategy for project type: `data`**
+
+## Overview
+
+Data pipeline projects are verified through:
+1. **Transformation tests** — Input data → expected output data
+2. **Schema validation** — Output matches expected schema/types
+3. **Data quality checks** — No nulls where unexpected, no duplicates, correct aggregations
+4. **Edge case coverage** — Empty datasets, malformed records, schema evolution
+
+## Process
+
+### Step 1: Ensure Environment is Ready
+
+```bash
+# Check database/data services are running (adjust for your project)
+docker-compose ps                           # Docker services
+psql -c "SELECT 1" 2>/dev/null             # PostgreSQL
+python -c "import pandas; print('OK')"     # Python deps
+```
+
+If not running, start with `bash init.sh`.
+
+### Step 2: Write Pipeline Tests
+
+Every feature MUST have tests covering:
+
+**Happy path:**
+```
+- Valid input data → correct transformed output
+- Aggregations produce correct totals/counts
+- Joins produce correct merged records
+- Output schema matches specification
+```
+
+**Error cases:**
+```
+- Malformed input records → skipped or logged (not crash)
+- Missing required fields → clear error or default value
+- Type mismatches → coercion or descriptive error
+- Connection failures → retry or clear error
+```
+
+**Edge cases:**
+```
+- Empty dataset → empty output (not crash or null)
+- Single record → works correctly
+- Very large dataset → completes within resource limits
+- Duplicate records → handled per spec (dedupe, keep-all, etc.)
+- Null/missing values → handled consistently
+- Schema evolution → backward-compatible
+```
+
+### Example Test Patterns
+
+#### Python (pytest + pandas)
+```python
+def test_transform_sales_data():
+    input_df = pd.DataFrame({
+        'date': ['2024-01-01', '2024-01-01', '2024-01-02'],
+        'product': ['A', 'B', 'A'],
+        'amount': [100, 200, 150]
+    })
+    result = transform_sales(input_df)
+
+    assert len(result) == 2  # Grouped by date
+    assert result.loc[result['date'] == '2024-01-01', 'total'].values[0] == 300
+    assert result.loc[result['date'] == '2024-01-02', 'total'].values[0] == 150
+
+def test_transform_handles_empty():
+    empty_df = pd.DataFrame(columns=['date', 'product', 'amount'])
+    result = transform_sales(empty_df)
+    assert len(result) == 0
+    assert list(result.columns) == ['date', 'total']  # Schema preserved
+
+def test_transform_handles_nulls():
+    input_df = pd.DataFrame({
+        'date': ['2024-01-01', None],
+        'product': ['A', 'B'],
+        'amount': [100, None]
+    })
+    result = transform_sales(input_df)
+    assert result['total'].isna().sum() == 0  # No nulls in output
+```
+
+#### SQL (dbt tests)
+```yaml
+# schema.yml
+models:
+  - name: sales_summary
+    columns:
+      - name: date
+        tests: [not_null, unique]
+      - name: total
+        tests: [not_null]
+    tests:
+      - dbt_utils.expression_is_true:
+          expression: "total >= 0"
+```
+
+#### Spark (PySpark)
+```python
+def test_aggregate_orders(spark):
+    input_data = [("2024-01-01", "A", 100), ("2024-01-01", "B", 200)]
+    input_df = spark.createDataFrame(input_data, ["date", "product", "amount"])
+
+    result = aggregate_orders(input_df)
+
+    assert result.count() == 1
+    row = result.collect()[0]
+    assert row["total"] == 300
+```
+
+### Step 3: Run Tests
+
+```bash
+pytest tests/ -v                          # Python
+dbt test                                   # dbt
+spark-submit --master local tests/         # Spark
+go test ./pipeline/...                     # Go
+```
+
+### Step 4: Verify Data Quality
+
+After tests pass, verify:
+
+1. **Schema correct** — Output columns/fields match spec
+2. **No data loss** — Row counts match expectations (input vs output)
+3. **No duplicates** — Unless explicitly expected
+4. **Aggregations correct** — Spot-check totals manually
+5. **Null handling consistent** — Documented and tested
+6. **Idempotent** — Running pipeline twice produces same result
+
+### Step 5: Validate with Sample Data
+
+Run the pipeline against a representative sample:
+
+```bash
+# Run with test fixtures
+python -m pipeline --input fixtures/sample_input.csv --output /tmp/output.csv
+
+# Verify output
+python -c "
+import pandas as pd
+df = pd.read_csv('/tmp/output.csv')
+print(f'Rows: {len(df)}')
+print(f'Columns: {list(df.columns)}')
+print(f'Nulls: {df.isnull().sum().to_dict()}')
+print(df.head())
+"
+```
+
+## Verification Checklist
+
+For each data feature, verify:
+
+- [ ] Input → output transformation is correct
+- [ ] Output schema matches specification
+- [ ] Null/missing values handled consistently
+- [ ] Empty input produces empty output (not error)
+- [ ] Aggregations are mathematically correct
+- [ ] No unintended data loss or duplication
+- [ ] Pipeline is idempotent (safe to re-run)
+- [ ] Error records are logged/quarantined (not silently dropped)
+- [ ] Performance is acceptable for expected data volumes
+
+## Parent Agent Post-Verification
+
+After subagent completes, parent MUST:
+1. Confirm all tests pass
+2. Verify output schema matches spec
+3. Check that edge cases (empty, null, duplicate) are tested
+4. If data quality checks seem thin, launch a follow-up subagent
diff --git a/references/verification/library-verification.md b/references/verification/library-verification.md
new file mode 100644
index 0000000..bfe6622
--- /dev/null
+++ b/references/verification/library-verification.md
@@ -0,0 +1,190 @@
+# Library Verification Strategy
+
+Verify library features through unit tests, public API validation, and integration examples.
+
+**This is the verification strategy for project type: `library`**
+
+## Overview
+
+Library projects are verified through:
+1. **Unit tests** — Thorough testing of all public functions/methods
+2. **Public API validation** — Exports, types, and interfaces are correct
+3. **Integration examples** — Real usage patterns work end-to-end
+4. **Edge case coverage** — Nil/null inputs, boundary values, concurrent access
+
+## Process
+
+### Step 1: Ensure Library Builds
+
+```bash
+# Build/compile the library (adjust for your project)
+go build ./...                    # Go
+cargo build                       # Rust
+npm run build                     # Node.js/TypeScript
+python -m py_compile src/*.py     # Python
+```
+
+### Step 2: Write Unit Tests
+
+Every public function/method MUST have tests covering:
+
+**Happy path:**
+```
+- Valid inputs → correct outputs
+- All overloads/variants work
+- Return types are correct
+```
+
+**Error cases:**
+```
+- Invalid inputs → clear error (not panic/crash)
+- Nil/null/undefined → handled gracefully
+- Out-of-range values → descriptive error
+- Type mismatches → compile-time or clear runtime error
+```
+
+**Edge cases:**
+```
+- Empty collections → correct behavior (not crash)
+- Boundary values → correct at min/max
+- Concurrent access → thread-safe if documented as such
+- Large inputs → handles without excessive memory/time
+```
+
+### Example Test Patterns
+
+#### Go
+```go
+func TestParse(t *testing.T) {
+    tests := []struct {
+        name    string
+        input   string
+        want    *Result
+        wantErr bool
+    }{
+        {"valid input", "hello", &Result{Value: "hello"}, false},
+        {"empty input", "", nil, true},
+        {"special chars", "a&b<c", &Result{Value: "a&b<c"}, false},
+    }
+    for _, tt := range tests {
+        t.Run(tt.name, func(t *testing.T) {
+            got, err := Parse(tt.input)
+            if tt.wantErr {
+                require.Error(t, err)
+                return
+            }
+            require.NoError(t, err)
+            assert.Equal(t, tt.want, got)
+        })
+    }
+}
+```
+
+#### Rust
+```rust
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn parse_valid_input() {
+        let result = parse("hello").unwrap();
+        assert_eq!(result.value, "hello");
+    }
+
+    #[test]
+    fn parse_empty_returns_error() {
+        assert!(parse("").is_err());
+    }
+}
+```
+
+#### TypeScript
+```typescript
+describe('parse', () => {
+  it('parses valid input', () => {
+    expect(parse('hello')).toEqual({ value: 'hello' });
+  });
+
+  it('throws on empty input', () => {
+    expect(() => parse('')).toThrow('Input cannot be empty');
+  });
+
+  it('handles special characters', () => {
+    expect(parse('a&b<c')).toEqual({ value: 'a&b<c' });
+  });
+});
+```
+
+#### Python
+```python
+def test_parse_valid():
+    assert parse("hello") == Result(value="hello")
+
+def test_parse_empty_raises():
+    with pytest.raises(ValueError, match="cannot be empty"):
+        parse("")
+```
+
+### Step 3: Run Tests
+
+```bash
+go test ./... -v -race            # Go (with race detection)
+cargo test                         # Rust
+npm test                           # Node.js
+pytest tests/ -v                   # Python
+```
+
+### Step 4: Verify Test Quality
+
+After tests pass, verify:
+
+1. **All public API tested** — Every exported function/type/method has tests
+2. **Table-driven tests** — Use table tests for functions with multiple input combinations
+3. **Error paths tested** — Not just happy paths
+4. **No test interdependence** — Tests pass in any order, no shared mutable state
+5. **Type safety** — TypeScript: no `any` in public API; Go: no `interface{}` leaking
+
+### Step 5: Verify Public API Surface
+
+Check that the library's public API is intentional:
+
+```bash
+# Go: check exported symbols
+go doc ./...
+
+# TypeScript: check exports
+grep -r "export " src/index.ts
+
+# Python: check __all__ or public functions
+grep -r "def [^_]" src/
+
+# Rust: check pub items
+grep -r "pub fn\|pub struct\|pub enum\|pub trait" src/
+```
+
+Ensure:
+- No internal helpers accidentally exported
+- Types are exported alongside functions that use them
+- Deprecated items are marked
+
+## Verification Checklist
+
+For each library feature, verify:
+
+- [ ] All public functions have unit tests
+- [ ] Error cases return descriptive errors (not panics/crashes)
+- [ ] Edge cases handled (nil, empty, boundary values)
+- [ ] Types are correct and exported
+- [ ] No unintended public API surface
+- [ ] Thread safety documented and tested (if applicable)
+- [ ] Performance is reasonable for expected input sizes
+- [ ] Documentation/comments on public API are accurate
+
+## Parent Agent Post-Verification
+
+After subagent completes, parent MUST:
+1. Confirm all tests pass (including race detection if applicable)
+2. Verify public API surface hasn't accidentally expanded
+3. Check that error types/messages are consistent
+4. If test coverage seems thin, launch a follow-up subagent
diff --git a/references/verification/mobile-verification.md b/references/verification/mobile-verification.md
new file mode 100644
index 0000000..d641be9
--- /dev/null
+++ b/references/verification/mobile-verification.md
@@ -0,0 +1,158 @@
+# Mobile Verification Strategy
+
+Verify mobile app features through E2E tests with screenshot capture and visual review.
+
+**This is the verification strategy for project type: `mobile`**
+
+## Overview
+
+Mobile projects are verified through:
+1. **E2E tests** — Detox (React Native) or XCTest/Espresso (native) exercising user flows
+2. **Screenshots** — Captured at key states for visual review
+3. **Visual review** — AI agent reviews every screenshot against quality criteria
+4. **Device coverage** — Test on multiple screen sizes
+
+## Prerequisites
+
+### React Native (Detox)
+```bash
+npm install -D detox
+npx detox build --configuration ios.sim.debug
+```
+
+### Flutter
+```bash
+flutter test integration_test/
+```
+
+### Native iOS (XCTest)
+```bash
+xcodebuild test -scheme MyApp -destination 'platform=iOS Simulator,name=iPhone 15'
+```
+
+## Process
+
+### Step 1: Ensure Emulator/Simulator is Running
+
+```bash
+# iOS Simulator
+xcrun simctl list devices | grep Booted
+
+# Android Emulator
+adb devices
+```
+
+### Step 2: Write E2E Tests with Screenshots
+
+Every test MUST capture screenshots at key states.
+
+#### Detox (React Native)
+```javascript
+describe('Login', () => {
+  it('should login successfully', async () => {
+    await device.takeScreenshot('login-initial');
+
+    await element(by.id('email-input')).typeText('test@example.com');
+    await element(by.id('password-input')).typeText('password123');
+    await element(by.id('login-button')).tap();
+
+    await expect(element(by.id('dashboard'))).toBeVisible();
+    await device.takeScreenshot('dashboard-after-login');
+  });
+});
+```
+
+#### Flutter
+```dart
+testWidgets('login flow', (tester) async {
+  await tester.pumpWidget(MyApp());
+
+  // Screenshot: initial
+  await expectLater(find.byType(MyApp), matchesGoldenFile('login-initial.png'));
+
+  await tester.enterText(find.byKey(Key('email')), 'test@example.com');
+  await tester.enterText(find.byKey(Key('password')), 'password123');
+  await tester.tap(find.byKey(Key('login-button')));
+  await tester.pumpAndSettle();
+
+  // Screenshot: after login
+  await expectLater(find.byType(MyApp), matchesGoldenFile('dashboard.png'));
+});
+```
+
+### Step 3: Run Tests
+
+```bash
+# Detox
+npx detox test --configuration ios.sim.debug
+
+# Flutter
+flutter test integration_test/
+
+# XCTest
+xcodebuild test -scheme MyApp -destination 'platform=iOS Simulator,name=iPhone 15'
+```
+
+### Step 4: Visual Review (MANDATORY)
+
+Use the Read tool to inspect EVERY screenshot. Evaluate:
+
+#### Layout
+- Content fits screen without horizontal scrolling
+- No elements clipped by safe area (notch, home indicator)
+- Proper alignment and spacing
+
+#### Touch Targets
+- All tappable elements at least 44x44 points
+- Adequate spacing between touch targets
+
+#### Platform Conventions
+- iOS: follows HIG (navigation bars, tab bars, system colors)
+- Android: follows Material Design (app bars, FAB, bottom nav)
+- Platform-appropriate gestures and transitions
+
+#### States
+- Loading indicators present during async operations
+- Empty states with helpful messaging
+- Error states with recovery options
+- Pull-to-refresh where appropriate
+
+#### Aesthetics
+- Polished and platform-native feel
+- Typography matches platform conventions
+- Colors and theming consistent
+- Smooth transitions between screens
+
+#### Device Sizes
+- Works on small screens (iPhone SE / small Android)
+- Works on large screens (iPhone Pro Max / tablet)
+- Landscape orientation handled (if applicable)
+
+### Step 5: Fix Issues
+
+If screenshots reveal problems:
+1. Fix layout/styling in the relevant component
+2. Re-run tests to capture updated screenshots
+3. Review again until all issues resolved
+
+## Verification Checklist
+
+For each mobile feature, verify:
+
+- [ ] E2E test passes on target platform(s)
+- [ ] Screenshots captured at key states
+- [ ] Touch targets are minimum 44x44 points
+- [ ] Safe area respected (notch, home indicator)
+- [ ] Loading states present for async operations
+- [ ] Error states present with recovery options
+- [ ] Works on small and large screen sizes
+- [ ] Platform conventions followed (HIG/Material)
+- [ ] Accessibility labels present on interactive elements
+
+## Parent Agent Post-Verification
+
+After subagent completes, parent MUST:
+1. Confirm screenshots exist for this feature
+2. Spot-check one screenshot with the Read tool
+3. If quality is poor, launch a polish subagent
+4. Verify platform conventions are followed
diff --git a/references/verification/web-verification.md b/references/verification/web-verification.md
new file mode 100644
index 0000000..b586a7d
--- /dev/null
+++ b/references/verification/web-verification.md
@@ -0,0 +1,148 @@
+# Web Verification Strategy
+
+Verify web features using Playwright E2E tests with screenshot capture and visual review.
+
+**This is the verification strategy for project type: `web`**
+
+## Overview
+
+Web projects are verified through:
+1. **E2E tests** — Playwright tests exercising user journeys
+2. **Screenshots** — Captured at key states for visual review
+3. **Visual review** — AI agent reviews every screenshot against quality criteria
+4. **UX standards compliance** — Loading/empty/error states, responsive, accessible
+
+## Prerequisites
+
+```bash
+npm install -D @playwright/test
+npx playwright install
+```
+
+## Process
+
+### Step 1: Ensure Environment is Running
+
+```bash
+# Check frontend and backend ports (adjust for your project)
+lsof -i :3000 | head -2  # Frontend
+lsof -i :8082 | head -2  # Backend
+```
+
+If not running, start them with `bash init.sh`.
+
+### Step 2: Write E2E Tests with Screenshots
+
+Every test MUST capture screenshots at key user journey points:
+
+```typescript
+import { test, expect } from '@playwright/test';
+
+test('user can login', async ({ page }) => {
+  await page.goto('/login');
+
+  // Screenshot: Initial state
+  await page.screenshot({
+    path: `e2e/screenshots/{scope}-feature-{id}-step1-login-initial.png`,
+    fullPage: true
+  });
+
+  await page.getByLabel('Email').fill('test@example.com');
+  await page.getByLabel('Password').fill('password123');
+  await page.getByRole('button', { name: 'Login' }).click();
+
+  await expect(page).toHaveURL('/dashboard');
+
+  // Screenshot: After action
+  await page.screenshot({
+    path: `e2e/screenshots/{scope}-feature-{id}-step2-dashboard-after-login.png`,
+    fullPage: true
+  });
+});
+```
+
+### Step 3: Run Tests
+
+```bash
+npx playwright test
+```
+
+### Step 4: Visual Review (MANDATORY)
+
+Use the Read tool to open and visually inspect EVERY screenshot. Evaluate:
+
+#### Layout
+- Content fits without overflow or clipping
+- Proper alignment (grid, flex)
+
+#### Spacing
+- Consistent spacing patterns (4/8/16/24/32px scale)
+- Not too cramped or sparse
+
+#### Visual Hierarchy
+- Most important action is obvious
+- Page title > section title > body text size hierarchy
+
+#### States
+- Loading state present (skeleton or spinner)
+- Empty state present (icon + message + CTA)
+- Error state present and styled
+
+#### Aesthetics
+- Polished and intentional, not generic/prototype-level
+- Typography is distinctive and hierarchical
+- Color palette is cohesive
+- Visual depth: appropriate shadows, borders
+
+#### Consistency
+- Similar screens use same patterns
+- Colors consistent with theme
+
+### Step 5: Fix Issues
+
+If screenshots reveal problems:
+1. Locate the relevant component file
+2. Make targeted CSS/layout changes
+3. Re-run tests to capture updated screenshots
+4. Review again until all issues resolved
+
+**Priority order:**
+1. Broken layout (overflow, clipping, misalignment)
+2. Missing states (loading, empty, error)
+3. Accessibility issues (contrast, focus rings, labels)
+4. Visual polish (shadows, transitions, typography)
+5. Consistency issues (spacing, colors)
+
+## Screenshot Naming Convention
+
+Format: `{scope}-feature-{id}-step{N}-{description}.png`
+
+Examples:
+- `auth-feature-17-step3-modal-open.png`
+- `core-feature-7-step6-project-in-list.png`
+
+## Playwright Configuration
+
+```typescript
+export default defineConfig({
+  timeout: 10000,
+  expect: { timeout: 3000 },
+  reporter: [
+    ['list'],
+    ['json', { outputFile: 'e2e/test-results/results.json' }],
+  ],
+  use: {
+    actionTimeout: 5000,
+    navigationTimeout: 10000,
+    screenshot: 'on',
+    trace: 'retain-on-failure',
+  },
+});
+```
+
+## Parent Agent Post-Verification
+
+After subagent completes, parent MUST:
+1. Confirm screenshots exist: `ls e2e/screenshots/{scope}-feature-{id}-*.png 2>/dev/null | wc -l`
+2. Spot-check one screenshot with the Read tool
+3. If quality is poor, launch a polish subagent
diff --git a/references/web/e2e-verification.md b/references/web/e2e-verification.md
new file mode 100644
index 0000000..9a8e9ef
--- /dev/null
+++ b/references/web/e2e-verification.md
@@ -0,0 +1,281 @@
+# E2E Screenshot Verification — Full Details
+
+Verify features work correctly using Playwright E2E tests with screenshot capture and visual review.
+
+**CRITICAL: Screenshots are MANDATORY for every feature. They are the primary evidence of correct implementation and UI quality. Skipping screenshots means the feature is NOT verified.**
+
+## Prerequisites
+
+Ensure Playwright is set up:
+
+```bash
+npm install -D @playwright/test
+npx playwright install
+```
+
+## Step-by-Step Process
+
+### Step 1: Ensure Environment is Running
+
+```bash
+lsof -i :3000 | head -2  # Frontend
+lsof -i :8082 | head -2  # Backend
+```
+
+If not running, start them with `bash init.sh`.
+
+### Step 2: Clear Old Screenshots
+
+```bash
+rm -rf e2e/screenshots/*.png e2e/screenshots/**/*.png 2>/dev/null || true
+rm -rf test-results/**/*.png 2>/dev/null || true
+```
+
+### Step 3: Run E2E Tests
+
+```bash
+# Run all tests
+npx playwright test
+
+# Or run specific test file
+npx playwright test e2e/auth.spec.ts
+
+# Or run tests matching a pattern
+npx playwright test --grep "login"
+```
+
+### Step 4: Check Test Results
+
+If tests fail, check error context:
+
+```bash
+# Find error context files
+find test-results -name "error-context.md" 2>/dev/null | head -5
+
+# Find failure screenshots
+find test-results -name "*.png" -type f | sort
+```
+
+Common failure causes:
+- Backend not running
+- Database not seeded
+- Port conflicts
+- Stale selectors
+
+### Step 5: List All Screenshots
+
+```bash
+find e2e/screenshots -name "*.png" -type f 2>/dev/null | sort
+find test-results -name "*.png" -type f 2>/dev/null | sort
+```
+
+### Step 6: Review Each Screenshot (MANDATORY)
+
+**CRITICAL**: Use the Read tool to open and visually inspect EVERY screenshot. For each screenshot, explicitly evaluate:
+
+#### Layout
+- ✓/✗ Content fits without overflow?
+- ✓/✗ No clipping or cut-off elements?
+- ✓/✗ Proper alignment (grid, flex)?
+
+#### Spacing
+- ✓/✗ Appropriate padding/margins?
+- ✓/✗ Not too cramped or sparse?
+- ✓/✗ Consistent spacing patterns (follows 4/8/16/24/32px scale)?
+
+#### Touch Targets
+- ✓/✗ Buttons/inputs at least 44px?
+- ✓/✗ Clickable areas visually obvious?
+
+#### Visual Hierarchy
+- ✓/✗ Most important action is obvious?
+- ✓/✗ Disabled states clearly distinguishable?
+- ✓/✗ Focus states visible?
+- ✓/✗ Page title > section title > body text size hierarchy?
+
+#### States
+- ✓/✗ Loading state present (skeleton or spinner)?
+- ✓/✗ Empty state present (icon + message + CTA)?
+- ✓/✗ Error state present and styled (red text, red borders)?
+
+#### Aesthetics (follow /frontend-design principles)
+- ✓/✗ Looks polished and intentional, not generic/prototype-level?
+- ✓/✗ Typography is distinctive and hierarchical?
+- ✓/✗ Color palette is cohesive?
+- ✓/✗ Visual depth: appropriate shadows, borders, or textures?
+- ✓/✗ Micro-interactions: hover/focus transitions visible?
+
+#### Data Display
+- ✓/✗ Shows real data, not placeholders?
+- ✓/✗ Numbers right-aligned in tables?
+- ✓/✗ Status badges have colored backgrounds with text?
+
+#### Consistency
+- ✓/✗ Similar screens use same patterns?
+- ✓/✗ Colors consistent with theme?
+- ✓/✗ Icons consistent in style and size?
+- ✓/✗ Spacing consistent with other pages?
+
+### Step 7: Fix Issues Found
+
+If screenshots reveal problems:
+
+1. Locate the relevant component file
+2. Make targeted CSS/layout changes
+3. Prefer Tailwind utilities over custom CSS
+4. Keep all `data-testid` attributes intact
+5. Re-run tests to capture updated screenshots
+6. Review again until all issues resolved
+7. Focus on the biggest visual impact first
+
+**Priority order for fixes:**
+1. Broken layout (overflow, clipping, misalignment)
+2. Missing states (loading, empty, error)
+3. Accessibility issues (contrast, focus rings, labels)
+4. Visual polish (shadows, transitions, typography)
+5. Consistency issues (spacing, colors)
+
+### Step 8: Document Verification
+
+After successful verification, note:
+- Which features were verified
+- Any UX improvements made
+- Screenshots reviewed (count)
+- Visual quality assessment
+
+## Writing Good E2E Tests
+
+### Key Principles
+
+1. **Use data-testid** for stable selectors
+2. **EVERY test MUST capture at least one screenshot** — no exceptions
+3. **Wait for conditions**, not timeouts
+4. **Test at multiple viewports** for responsive features
+5. **Mock external APIs** when needed
+
+### Example Test with Screenshots (REQUIRED PATTERN)
+
+```typescript
+import { test, expect } from '@playwright/test';
+
+test('user can login', async ({ page }) => {
+  await page.goto('/login');
+
+  // Screenshot: Login page initial state
+  await page.screenshot({
+    path: `e2e/screenshots/auth-feature-1-step1-login-initial.png`,
+    fullPage: true
+  });
+
+  await page.getByLabel('Email').fill('test@example.com');
+  await page.getByLabel('Password').fill('password123');
+  await page.getByRole('button', { name: 'Login' }).click();
+
+  await expect(page).toHaveURL('/dashboard');
+
+  // Screenshot: Dashboard after login
+  await page.screenshot({
+    path: `e2e/screenshots/auth-feature-1-step2-dashboard-after-login.png`,
+    fullPage: true
+  });
+});
+```
+
+### Screenshot Rules (MANDATORY)
+
+- **Every test MUST have at least one `page.screenshot()` call**
+- Name screenshots descriptively with scope prefix
+- Use `fullPage: true` to capture complete page state
+- Capture at key user journey points (before action, after action, error state)
+- Include error states and empty states in screenshots
+- Capture responsive breakpoints if the feature involves responsive behavior:
+  ```typescript
+  // Desktop screenshot
+  await page.setViewportSize({ width: 1280, height: 720 });
+  await page.screenshot({
+    path: 'e2e/screenshots/scope-feature-1-step1-desktop.png',
+    fullPage: true
+  });
+
+  // Mobile screenshot
+  await page.setViewportSize({ width: 375, height: 812 });
+  await page.screenshot({
+    path: 'e2e/screenshots/scope-feature-1-step1-mobile.png',
+    fullPage: true
+  });
+  ```
+
+### Screenshot Naming Convention
+
+Format: `{scope}-feature-{id}-step{N}-{description}.png`
+
+The scope name comes from `.active-scope` file (e.g., "auth", "core", "video-editor").
+
+Examples:
+- `auth-feature-17-step3-modal-open.png`
+- `core-feature-7-step6-project-in-list.png`
+- `video-editor-feature-15-complete-flow.png`
+- `pim-feature-4-step2-validation-errors.png`
+
+## Playwright Configuration
+
+Optimize for AI agent consumption:
+
+```typescript
+export default defineConfig({
+  // Short timeouts - fail fast
+  timeout: 10000,           // 10s max per test
+  expect: {
+    timeout: 3000,          // 3s max for assertions
+  },
+
+  // AI-readable output format
+  reporter: [
+    ['list'],               // Simple pass/fail list
+    ['json', { outputFile: 'e2e/test-results/results.json' }],
+  ],
+
+  use: {
+    actionTimeout: 5000,    // 5s max for clicks/fills
+    navigationTimeout: 10000,
+    screenshot: 'on',       // Keep ALL screenshots
+    trace: 'retain-on-failure',
+  },
+});
+```
+
+**Why keep ALL screenshots:**
+- AI agents need to review UI for UX issues, not just failures
+- Success screenshots enable visual regression detection
+- Human reviewers can audit AI's work quality
+- Screenshots are the primary evidence of correct implementation
+
+**Why short timeouts:**
+- Long waits waste tokens and time
+- Missing elements should fail immediately
+- Fast feedback enables rapid iteration
+- AI can read JSON results directly
+
+## Troubleshooting
+
+### Tests Timeout
+- Increase timeout in playwright.config.ts
+- Check if backend is responding
+- Look for infinite loading states
+
+### Flaky Tests
+- Use `await expect()` instead of raw assertions
+- Wait for network idle: `await page.waitForLoadState('networkidle')`
+- Add retries in CI
+
+### Screenshots Blank or Wrong
+- Ensure page fully loaded before screenshot
+- Check viewport size
+- Verify correct URL navigation
+- Add `await page.waitForLoadState('networkidle')` before screenshot
+
+### UI Looks Generic in Screenshots
+- Review references/frontend-design.md and references/ux-standards.md
+- Check for: distinctive typography, cohesive colors, proper shadows/depth
+- Verify loading/empty/error states are polished, not bare text
+- Add micro-interactions: hover transitions, focus effects
diff --git a/references/web/frontend-design.md b/references/web/frontend-design.md
new file mode 100644
index 0000000..7a58d40
--- /dev/null
+++ b/references/web/frontend-design.md
@@ -0,0 +1,76 @@
+# Frontend Design Principles
+
+> Adapted from the `/frontend-design` skill. These principles guide all UI implementation in iterative-web-dev projects.
+
+## Core Philosophy
+
+Create distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Every UI decision should be intentional, not default.
+
+## Design Thinking (Before Coding)
+
+Before implementing any UI, pause and consider:
+- **Purpose**: What problem does this interface solve? Who uses it?
+- **Tone**: Pick a direction: brutally minimal, luxury/refined, playful, editorial/magazine, industrial/utilitarian, soft/pastel, etc. Admin dashboards often benefit from refined minimalism or industrial clarity.
+- **Constraints**: Technical requirements (framework, performance, accessibility)
+- **Differentiation**: What makes this memorable? What's the signature detail?
+
+**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work — the key is intentionality, not intensity.
+
+## Aesthetic Guidelines
+
+### Typography
+- Choose fonts that are beautiful, unique, and interesting
+- **NEVER** use: Inter, Roboto, Arial, system fonts, or other generic defaults
+- Opt for distinctive choices that elevate the interface
+- Pair a display font with a refined body font
+- Use font size contrast to create hierarchy (title vs body vs caption)
+
+### Color & Theme
+- Commit to a cohesive aesthetic — don't scatter colors randomly
+- Use CSS variables for consistency across the app
+- Dominant colors with sharp accents outperform timid, evenly-distributed palettes
+- Dark themes and light themes both work — choose intentionally
+- Avoid cliched AI color schemes (particularly purple gradients on white)
+
+### Motion & Micro-interactions
+- Use animations for high-impact moments: page load reveals, state transitions
+- Prioritize CSS-only solutions for performance
+- Focus on: hover states that surprise, smooth transitions between states, staggered reveals
+- One well-orchestrated animation creates more delight than scattered micro-interactions
+- Button feedback, loading spinners, and toast slide-ins should all feel intentional
+
+### Spatial Composition
+- Unexpected layouts > cookie-cutter grids
+- Asymmetry, overlap, diagonal flow, grid-breaking elements can all work
+- Generous negative space OR controlled density — pick one and commit
+- For admin/dashboard UIs: clean grid with clear visual hierarchy usually works best
+
+### Backgrounds & Visual Details
+- Create atmosphere and depth rather than solid white/gray backgrounds
+- Consider: subtle gradients, noise textures, geometric patterns, layered transparencies
+- Dramatic shadows, decorative borders, grain overlays — use sparingly but intentionally
+- For admin UIs: subtle texture or gradient in sidebar, clean content area
+
+## Anti-Patterns (NEVER Do These)
+
+- Generic font stacks (Inter, Roboto, Arial, system-ui)
+- Purple gradients on white backgrounds
+- Predictable layouts with no visual interest
+- Cookie-cutter component patterns with default styles
+- Bare, unstyled HTML elements
+- Empty pages with just text "No items found"
+- Forms with no visual grouping or hierarchy
+- Tables with no hover states or alignment
+- Buttons with no feedback on interaction
+- Dialogs with no backdrop or transitions
+
+## Practical Application for Admin/Dashboard UIs
+
+Admin interfaces need to be **functional AND beautiful**. This means:
+
+1. **Clean but not bland** — Use subtle visual interest: card shadows, section dividers, icon accents
+2. **Data-dense but readable** — Use typography hierarchy, proper spacing, zebra striping
+3. **Efficient but not ugly** — Forms should be organized with sections, not a wall of inputs
+4. **Professional but not generic** — Choose a color palette, font pairing, and spacing system that has character
+
+Remember: Claude is capable of extraordinary creative work. Don't hold back — show what can truly be created when committing fully to a distinctive vision, even for "boring" admin UIs.
diff --git a/references/web/ux-standards.md b/references/web/ux-standards.md
new file mode 100644
index 0000000..953ab33
--- /dev/null
+++ b/references/web/ux-standards.md
@@ -0,0 +1,126 @@
+# UX Standards for Production-Quality Apps
+
+Every feature implemented by a subagent must meet these standards. A feature that works but looks like a prototype is NOT complete.
+
+## Non-Negotiable Standards (every page must have these)
+
+### Loading States
+- Use skeleton screens for initial page load (preferred over spinners)
+- Show inline spinner for actions (save, delete, bulk operations)
+- Button text changes during action: "Save" → "Saving..." with disabled state
+- Never show a blank page while data loads
+
+### Empty States
+- Icon + heading + description + CTA button
+- Example: `[inbox icon] "No products yet" / "Create your first product to get started" / [Add Product button]`
+- Empty search results: "No results for 'X'" with a "Clear filters" link
+- Never show just an empty table or blank area
+
+### Error States
+- Inline errors below form fields (red text + red border on field)
+- Toast notifications for action errors (red/destructive variant)
+- Full-page error boundary for crashes with retry option
+- Never show raw error messages or stack traces to users
+
+### Responsive Design
+- **375px (Mobile)**: Single column, hamburger nav, stacked cards, horizontal scroll for tables
+- **768px (Tablet)**: 2 columns, condensed sidebar or top nav
+- **1280px (Desktop)**: Full layout with permanent sidebar
+- Tables must either scroll horizontally on mobile OR collapse to card layout
+- Touch targets minimum 44px on mobile
+- Test at all three breakpoints
+
+### Accessibility
+- All interactive elements: aria-label or associated visible label
+- Focus ring visible on keyboard navigation (focus-visible)
+- Color is never the only indicator (always add text or icon)
+- Minimum contrast ratio: 4.5:1 for text
+- Modals must trap focus and close on Escape
+- Form inputs must have associated labels
+
+## Visual Design Standards
+
+### Typography Hierarchy
+- Page title: `text-2xl font-bold` or larger
+- Section title: `text-lg font-semibold`
+- Body text: `text-sm` or `text-base`
+- Caption/label: `text-xs text-muted-foreground`
+- Choose distinctive fonts — avoid generic defaults like Inter, Arial, system-ui
+- Pair a display font with a complementary body font
+
+### Color & Theme
+- Commit to a cohesive palette — don't use random colors
+- Define CSS variables for consistency
+- Dominant color with sharp accent outperforms evenly-distributed palettes
+- Status colors: green=success/active, yellow/amber=warning/draft, red=error/destructive, gray=neutral/archived
+- Status badges must have both colored background AND text (not color alone)
+
+### Spacing Scale
+Use a consistent scale throughout the app:
+- `4px` (p-1) — tight inline spacing
+- `8px` (p-2) — compact elements
+- `12px` (p-3) — standard inline padding
+- `16px` (p-4) — standard section padding
+- `24px` (p-6) — generous section spacing
+- `32px` (p-8) — major section breaks
+- `48px` (p-12) — page-level spacing
+
+### Shadows & Depth
+- Cards: `shadow-sm` at rest, `shadow-md` on hover
+- Modals/dialogs: `shadow-lg`
+- Dropdowns: `shadow-md`
+- Always add transition: `transition-shadow duration-200`
+
+### Transitions & Micro-interactions
+- Hover effects: `transition-colors duration-150` or `transition-all duration-200`
+- Button press feedback: slight scale or color change
+- Page elements: subtle fade-in on mount
+- Sidebar/menu open: slide transition with backdrop
+- Toast notifications: slide-in from edge
+- Never change state abruptly — always transition
+
+## Feature-Specific Standards
+
+### Forms
+- Group related fields with section headers and visual dividers
+- Required fields marked with asterisk (*)
+- Help text below non-obvious fields (smaller, muted color)
+- Auto-generation feedback (e.g., slug auto-generates as user types name)
+- Submit button shows loading state, disables during submit
+- Cancel navigates back without side effects
+- Unsaved changes: consider confirm-before-leave
+
+### Tables
+- Column headers: bold, uppercase or semi-bold, with sort indicators
+- Zebra striping: alternating row backgrounds (subtle, `even:bg-muted/50`)
+- Hover highlighting: `hover:bg-muted transition-colors`
+- Text alignment: text left, numbers right, status centered
+- Actions column: icon buttons with tooltips
+- Pagination: show current page, total pages, and per-page count
+
+### Cards / Grid Views
+- Consistent card sizing within a grid
+- Rounded corners (`rounded-lg`)
+- Border or shadow for visual separation
+- Hover effect for clickable cards
+- Image aspect ratio maintained
+
+### Navigation
+- Active link clearly distinguished (background color, font weight, or indicator)
+- Breadcrumbs on nested pages (e.g., Products > Edit MacBook Pro)
+- Mobile: hamburger menu with slide-in overlay + backdrop
+- Keyboard accessible: Tab through links, Enter to activate
+
+### Dialogs / Modals
+- Backdrop overlay (semi-transparent black)
+- Centered with max-width appropriate to content
+- Close button (X) in top-right corner
+- Close on Escape key and backdrop click
+- Focus trapped inside dialog
+- Destructive actions: red/destructive button variant
+
+### Toast Notifications
+- Success: green variant, auto-dismiss after 4s
+- Error: red/destructive variant, longer display or manual dismiss
+- Position: bottom-right or top-right, consistent throughout app
+- Include relevant context (e.g., "Product 'MacBook Pro' deleted")

From 6f88e11e7b815fcb0f3900e62b0a730a869f3e42 Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Wed, 18 Mar 2026 05:42:59 +0800
Subject: [PATCH 02/17] Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---
 package.json                                | 7 +------
 references/verification/web-verification.md | 4 ++--
 references/web/e2e-verification.md          | 2 +-
 references/web/frontend-design.md           | 2 +-
 4 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/package.json b/package.json
index c47a537..2841f51 100644
--- a/package.json
+++ b/package.json
@@ -19,10 +19,5 @@
     "subagent",
     "autonomous"
   ],
-  "author": "",
-  "license": "MIT",
-  "repository": {
-    "type": "git",
-    "url": ""
-  }
+  "license": "MIT"
 }
diff --git a/references/verification/web-verification.md b/references/verification/web-verification.md
index b586a7d..f35bd3d 100644
--- a/references/verification/web-verification.md
+++ b/references/verification/web-verification.md
@@ -43,7 +43,7 @@ test('user can login', async ({ page }) => {
 
   // Screenshot: Initial state
   await page.screenshot({
-    path: `e2e/screenshots/{scope}-feature-{id}-step1-login-initial.png`,
+    path: `e2e/screenshots/${scope}-feature-${id}-step1-login-initial.png`,
     fullPage: true
   });
 
@@ -55,7 +55,7 @@ test('user can login', async ({ page }) => {
 
   // Screenshot: After action
   await page.screenshot({
-    path: `e2e/screenshots/{scope}-feature-{id}-step2-dashboard-after-login.png`,
+    path: `e2e/screenshots/${scope}-feature-${id}-step2-dashboard-after-login.png`,
     fullPage: true
   });
 });
diff --git a/references/web/e2e-verification.md b/references/web/e2e-verification.md
index 9a8e9ef..253a53e 100644
--- a/references/web/e2e-verification.md
+++ b/references/web/e2e-verification.md
@@ -275,7 +275,7 @@ export default defineConfig({
 - Add `await page.waitForLoadState('networkidle')` before screenshot
 
 ### UI Looks Generic in Screenshots
-- Review references/frontend-design.md and references/ux-standards.md
+- Review references/web/frontend-design.md and references/web/ux-standards.md
 - Check for: distinctive typography, cohesive colors, proper shadows/depth
 - Verify loading/empty/error states are polished, not bare text
 - Add micro-interactions: hover transitions, focus effects
diff --git a/references/web/frontend-design.md b/references/web/frontend-design.md
index 7a58d40..c882ca2 100644
--- a/references/web/frontend-design.md
+++ b/references/web/frontend-design.md
@@ -1,6 +1,6 @@
 # Frontend Design Principles
 
-> Adapted from the `/frontend-design` skill. These principles guide all UI implementation in iterative-web-dev projects.
+> Adapted from the `/frontend-design` skill. These principles guide all UI implementation across iterative-dev projects of all types.
 
 ## Core Philosophy
 

From 9d6e6f72726b512c6110f0837fe41ea5c2eed7b6 Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Wed, 18 Mar 2026 05:50:12 +0800
Subject: [PATCH 03/17] docs: add how-to-use examples covering all usage
 patterns

Show 5 cases: manual spec, agent-generated spec, scope switching,
compliance audits, and continuing multi-session projects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 README.md | 118 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 117 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 108f447..c8aefd8 100644
--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@ An AI skill for iterative development with AI agents. Works with **any project t
 ## Installation
 
 ```bash
-npx skills add https://github.com/sunfmin/iterative-web-dev
+npx skills add https://github.com/theplant/iterative-dev
 ```
 
 ## Overview
@@ -63,6 +63,122 @@ This skill provides a complete workflow for AI agents working on long-running de
 4. Parent agent confirms completion, then **loops back** to pick the next feature
 5. Only stops when ALL features have `"passes": true`
 
+## How to Use
+
+### Case 1: Write spec.md yourself, then initialize
+
+Best when you have a clear vision of what to build. Write the spec first, then let the agent set up the scope and generate the feature list.
+
+**Step 1 — Write your spec:**
+
+Create `specs/auth/spec.md` (or any scope name) with your project specification:
+
+```markdown
+# Auth System
+
+Build a JWT-based authentication system with:
+- User registration with email/password
+- Login endpoint returning JWT tokens
+- Password reset via email
+- Role-based access control (admin, user)
+- Rate limiting on auth endpoints
+```
+
+**Step 2 — Initialize the scope:**
+
+```
+> Initialize scope "auth" using the spec I wrote in specs/auth/spec.md
+```
+
+The agent will read your spec, detect the project type, generate `feature_list.json`, create `init.sh`, and commit.
+
+**Step 3 — Continue (every subsequent session):**
+
+```
+> Continue working
+```
+
+The agent picks up where it left off and implements all remaining features autonomously.
+
+---
+
+### Case 2: Describe what you want, let the agent generate spec.md
+
+Best for brainstorming or when you want the agent to help shape the spec. Just describe your idea in the prompt.
+
+```
+> Initialize a new scope called "dashboard". I want a real-time analytics dashboard
+> with charts for user signups, revenue, and API usage. It should have date range
+> filters, CSV export, and a dark mode toggle. Use React + Recharts.
+```
+
+The agent will:
+1. Create `specs/dashboard/spec.md` from your description
+2. Detect project type (web)
+3. Generate `feature_list.json` with prioritized features
+4. Create `init.sh` with the right dev environment setup
+5. Commit everything
+
+Then continue in subsequent sessions:
+
+```
+> Continue working
+```
+
+---
+
+### Case 3: Switch between existing scopes
+
+When you have multiple scopes and want to switch context:
+
+```
+> Switch to scope "video-editor"
+```
+
+The agent updates `.active-scope` and symlinks `spec.md` / `feature_list.json` to the selected scope.
+
+---
+
+### Case 4: Compliance / standards alignment scope
+
+When your scope is about aligning code with a reference document (not building new features):
+
+```
+> Initialize a new scope called "standards-alignment" to align our codebase
+> with the requirements in AGENTS.md
+```
+
+The agent uses the **Constitution Audit Workflow** — it systematically extracts every requirement from the reference document, verifies each against your code, and generates features only from verified violations.
+
+---
+
+### Case 5: Continue a multi-session project
+
+Every session after the first, just say:
+
+```
+> Continue working
+> Pick up where I left off
+> Next feature
+```
+
+The agent reads `feature_list.json` and `progress.txt`, runs regression tests, then implements all remaining features in a loop — committing after each one. It won't stop until everything passes.
+
+---
+
+### Typical workflow timeline
+
+```
+Session 1:  "Initialize scope 'my-app' — here's what I want to build: ..."
+            → Agent creates spec.md, feature_list.json, init.sh
+
+Session 2:  "Continue working"
+            → Agent implements features #1–#5, commits each
+
+Session 3:  "Continue working"
+            → Agent implements features #6–#12, all pass, scope complete
+```
+
 ## Project Structure
 
 ```

From 5317f678d3d49c27037ce935b6bdcb5a0ead4739 Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Wed, 18 Mar 2026 05:55:33 +0800
Subject: [PATCH 04/17] refactor: make progress.txt per-scope and symlinked

progress.txt is scope-specific, so it belongs inside specs/{scope}/
alongside spec.md and feature_list.json, symlinked to the project root.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 README.md                                    |  2 +-
 SKILL.md                                     | 11 +++++++----
 references/core/session-handoff-standards.md |  2 +-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index c8aefd8..edd2286 100644
--- a/README.md
+++ b/README.md
@@ -52,7 +52,7 @@ This skill provides a complete workflow for AI agents working on long-running de
 
 - `spec.md` — Project specification (symlink to active scope)
 - `feature_list.json` — Feature tracking with pass/fail status and project type
-- `progress.txt` — Session progress log
+- `progress.txt` — Session progress log (symlink to active scope)
 - `init.sh` — Development environment setup script
 
 ## How It Works (Claude Code)
diff --git a/SKILL.md b/SKILL.md
index c56d5b6..487ae18 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -91,14 +91,16 @@ project-root/
 ├── specs/
 │   ├── auth/
 │   │   ├── spec.md
-│   │   └── feature_list.json
+│   │   ├── feature_list.json
+│   │   └── progress.txt
 │   └── video-editor/
 │       ├── spec.md
-│       └── feature_list.json
+│       ├── feature_list.json
+│       └── progress.txt
 ├── .active-scope
 ├── spec.md              # Symlink to active scope
-├── feature_list.json        # Symlink to active scope
-├── progress.txt
+├── feature_list.json    # Symlink to active scope
+├── progress.txt         # Symlink to active scope
 └── init.sh
 ```
 
@@ -121,6 +123,7 @@ project-root/
    echo "auth" > .active-scope
    ln -sf specs/auth/spec.md spec.md
    ln -sf specs/auth/feature_list.json feature_list.json
+   ln -sf specs/auth/progress.txt progress.txt
    ```
 
 4. **Determine project type** — detect or ask:
diff --git a/references/core/session-handoff-standards.md b/references/core/session-handoff-standards.md
index cef1cd1..05d3747 100644
--- a/references/core/session-handoff-standards.md
+++ b/references/core/session-handoff-standards.md
@@ -73,7 +73,7 @@ grep -r "^<<<<<<< \|^=======$\|^>>>>>>> " --include="*.ts" --include="*.tsx" --i
 # No TODO without feature list item
 grep -rn "TODO\|FIXME" --exclude-dir=node_modules --exclude-dir=vendor --exclude-dir=target --exclude-dir=__pycache__ -l
 
-# progress.txt exists and was recently updated
+# progress.txt exists (symlink to active scope) and was recently updated
 ls -la progress.txt
 tail -20 progress.txt
 ```

From 9ebb230df0a1043d2703998482fe8e289bffc952 Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Wed, 18 Mar 2026 07:23:47 +0800
Subject: [PATCH 05/17] docs: enforce absolute reference paths and strengthen
 screenshot verification

Subagents could not find reference docs because paths were relative to the
skill install directory, not the project. This resolves all reference paths
to absolute paths in subagent prompts. Also inlines screenshot capture
instructions into the subagent template and adds a non-negotiable screenshot
gate in the parent agent's post-verification flow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 SKILL.md                                    | 125 ++++++++++++++++----
 references/core/init-script-template.md     |   8 +-
 references/verification/web-verification.md |   8 +-
 references/web/e2e-verification.md          |  16 +++
 4 files changed, 128 insertions(+), 29 deletions(-)

diff --git a/SKILL.md b/SKILL.md
index 487ae18..de33c7d 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -212,6 +212,8 @@ Run FINAL STANDARDS AUDIT before ending session
 
 For each feature, use the **Agent tool** to launch a subagent. This keeps each feature's work isolated and prevents context window overflow.
 
+**IMPORTANT — Reference doc paths:** The `references/` directory lives inside this skill's install directory, NOT in the project. When building subagent prompts, you MUST resolve paths to absolute paths. Use: `{skill_base_dir}/references/...` where `{skill_base_dir}` is the "Base directory for this skill" shown at the top of this prompt. For example, if the skill base is `/Users/alice/.claude/skills/iterative-dev`, then the path is `/Users/alice/.claude/skills/iterative-dev/references/core/code-quality.md`.
+
 **Subagent prompt template:**
 
 ```
@@ -232,12 +234,12 @@ You are implementing a feature for a {type} project. Work autonomously — do NO
 
 ## Standards Documents
 Read these reference docs and follow them during implementation:
-- references/core/code-quality.md — Code organization, testability, unit testing rules
-- references/core/gitignore-standards.md — Files that must never be committed
-- references/verification/{type}-verification.md — Verification strategy for this project type
+- {skill_base_dir}/references/core/code-quality.md — Code organization, testability, unit testing rules
+- {skill_base_dir}/references/core/gitignore-standards.md — Files that must never be committed
+- {skill_base_dir}/references/verification/{type}-verification.md — Verification strategy for this project type
 {IF type == "web" or type == "mobile":}
-- references/web/ux-standards.md — UX quality requirements (loading/empty/error states, responsive, accessibility)
-- references/web/frontend-design.md — Visual design principles (typography, color, composition)
+- {skill_base_dir}/references/web/ux-standards.md — UX quality requirements (loading/empty/error states, responsive, accessibility)
+- {skill_base_dir}/references/web/frontend-design.md — Visual design principles (typography, color, composition)
 {END IF}
 
 ## Instructions
@@ -245,33 +247,70 @@ Read these reference docs and follow them during implementation:
 ### Phase 1: Implement
 1. Read the relevant source files to understand the current codebase
 2. Read the spec.md file for full project context
-3. Read the standards documents listed above
+3. Read the standards documents listed above (use the ABSOLUTE paths provided)
 4. Implement the feature following existing code patterns and the standards
 5. Make sure the implementation is complete and production-quality
 
 ### Phase 2: Refactor & Unit Test
-Follow references/core/code-quality.md:
+Follow {skill_base_dir}/references/core/code-quality.md:
 6. Extract pure functions out of components and handlers
 7. Move business logic into testable utility/service modules
 8. Eliminate duplication — reuse existing helpers or extract new shared ones
 9. Write unit tests for all extracted logic. Run them until green.
 
 ### Phase 3: Verification
-Follow references/verification/{type}-verification.md:
+Follow {skill_base_dir}/references/verification/{type}-verification.md:
 10. Execute the verification strategy defined for {type} projects
 11. Run all relevant tests — fix until green
 12. MANDATORY: Perform the verification checks specified in the doc
     Fix and re-run until all pass.
 
+{IF type == "web" or type == "mobile":}
+### Phase 3b: Screenshot Capture (NON-NEGOTIABLE for web/mobile)
+
+Screenshots are MANDATORY for every UI feature. They are the primary evidence of correct implementation and UI quality. A feature without screenshots is NOT verified.
+
+**Screenshot directory:** `{screenshots_dir}` (provided by parent agent — this is the absolute path to where screenshots are stored, e.g., `/path/to/project/frontend/e2e/screenshots` for a monorepo or `/path/to/project/e2e/screenshots` for a standalone frontend).
+
+13. Write or update a Playwright test file that captures screenshots at key states:
+    - Use `page.screenshot({ path: '{screenshots_dir}/{scope}-feature-{id}-step{N}-{description}.png', fullPage: true })`
+    - Capture BEFORE action, AFTER action, error states, and empty states
+    - Every test MUST have at least one `page.screenshot()` call
+
+14. Run the Playwright tests:
+    ```bash
+    npx playwright test
+    ```
+
+15. Verify screenshots were generated:
+    ```bash
+    ls {screenshots_dir}/{scope}-feature-{id}-*.png
+    ```
+    If no screenshots exist, the verification has FAILED. Fix and re-run.
+
+16. Use the Read tool to open and visually review EVERY screenshot. Check:
+    - Layout: content fits, no overflow/clipping, proper alignment
+    - Spacing: consistent padding/margins (4/8/16/24/32px scale)
+    - Visual hierarchy: important actions obvious, proper text size hierarchy
+    - States: loading skeleton/spinner, empty state (icon + message + CTA), error state
+    - Aesthetics: polished and intentional, cohesive colors, proper shadows/depth
+    - Data display: real data shown, numbers right-aligned in tables, status badges colored
+
+17. If screenshots reveal problems, fix the UI and re-capture until quality is acceptable.
+
+**Screenshot naming convention:** `{scope}-feature-{id}-step{N}-{description}.png`
+Examples: `pim-feature-9-step1-product-list.png`, `pim-feature-9-step2-empty-state.png`
+{END IF}
+
 ### Phase 4: Gitignore Review
-Follow references/core/gitignore-standards.md:
-13. Run `git status --short` and check every file against gitignore patterns
-14. Add any missing patterns to `.gitignore`, remove from tracking if needed
+Follow {skill_base_dir}/references/core/gitignore-standards.md:
+18. Run `git status --short` and check every file against gitignore patterns
+19. Add any missing patterns to `.gitignore`, remove from tracking if needed
 
 ### Phase 5: Commit
-15. Update feature_list.json — change "passes": false to "passes": true
-16. Update progress.txt with what was done and current feature pass count
-17. Commit all changes:
+20. Update feature_list.json — change "passes": false to "passes": true
+21. Update progress.txt with what was done and current feature pass count
+22. Commit all changes:
     git add -A && git commit -m "feat: [description] — Implemented feature #[id]: [description]"
 
 ## Key Rules
@@ -281,6 +320,10 @@ Follow references/core/gitignore-standards.md:
 - Make all decisions yourself, never ask for human input
 - EVERY feature must be verified per the verification strategy — no exceptions
 - BEFORE committing, review ALL files for .gitignore candidates
+{IF type == "web" or type == "mobile":}
+- SCREENSHOTS ARE NON-NEGOTIABLE — do not skip or defer them
+- If the app/server is not running for screenshots, start it (check init.sh or start manually)
+{END IF}
 ```
 
 **How to launch the subagent:**
@@ -299,16 +342,39 @@ The subagent handles implementation, testing, verification, and committing. The
 
 1. **Confirm commit** — `git log --oneline -1`
 2. **Confirm feature_list.json** — feature has `"passes": true`
-3. **Verify output quality** — type-specific checks:
-
-   **For `web` and `mobile` projects:**
-   - VERIFY SCREENSHOTS EXIST:
-     ```bash
-     ls e2e/screenshots/{scope}-feature-{id}-*.png 2>/dev/null | wc -l
-     ```
-     If count is 0, launch a follow-up subagent to add screenshots and visual review.
-   - SPOT-CHECK one screenshot — Use the Read tool to open one screenshot. Evaluate against verification criteria.
-   - If quality is poor, launch a **polish subagent**.
+3. **Verify output quality (NON-NEGOTIABLE GATE)** — type-specific checks. You MUST run these checks. Do NOT skip them even if the subagent reported success.
+
+   **For `web` and `mobile` projects — SCREENSHOT GATE (NON-NEGOTIABLE):**
+
+   This gate MUST be executed for EVERY UI feature. It is the primary quality control for visual output. Skipping this gate means the feature is NOT verified.
+
+   **Determine `{screenshots_dir}`:** The screenshot directory depends on project structure:
+   - **Monorepo** (frontend in a subdirectory like `frontend/`): `{pwd}/frontend/e2e/screenshots`
+   - **Standalone frontend** (frontend at project root): `{pwd}/e2e/screenshots`
+   - Auto-detect: look for `playwright.config.ts` — screenshots live in `e2e/screenshots/` relative to that config file's directory.
+   - You MUST pass this resolved absolute path as `{screenshots_dir}` when building subagent prompts.
+
+   a. **CHECK screenshots exist:**
+      ```bash
+      ls {screenshots_dir}/{scope}-feature-{id}-*.png 2>/dev/null | wc -l
+      ```
+   b. **If count is 0: BLOCK.** The feature is NOT complete. Launch a follow-up subagent specifically to capture screenshots:
+      ```
+      Prompt: "You need to add screenshot capture for feature #{id} ({description}).
+      The feature is already implemented and committed. Your ONLY job is:
+      1. Start the dev server if not running (check with lsof, start with init.sh if needed)
+      2. Write/update a Playwright test that navigates to the feature and captures screenshots
+      3. Screenshots MUST be saved to: {screenshots_dir}/{scope}-feature-{id}-step{N}-{description}.png
+      4. Run the test: npx playwright test
+      5. Verify screenshots exist: ls {screenshots_dir}/{scope}-feature-{id}-*.png
+      6. Use the Read tool to visually review each screenshot
+      7. Commit the screenshots and test file"
+      ```
+   c. **If count > 0: SPOT-CHECK.** Use the Read tool to open one screenshot. Evaluate:
+      - Layout correct? Content fits, no overflow?
+      - Real data shown, not empty/broken?
+      - Polished appearance, not prototype-level?
+      - If quality is poor, launch a **polish subagent** to fix UI issues and recapture.
 
    **For `api` projects:**
    - Verify integration tests exist and pass
@@ -419,10 +485,19 @@ Before ending:
 ## Critical Rules
 
 ### Standards Enforcement
-- All quality standards live in `references/` docs — subagents MUST read them
+- All quality standards live in `references/` docs within this skill's base directory — subagents MUST read them using absolute paths
+- **CRITICAL**: Reference doc paths are relative to THIS SKILL's install directory (shown as "Base directory for this skill" at the top of this prompt), NOT the project working directory. Always resolve to absolute paths before passing to subagents.
 - Standards are verified both during implementation (by subagent) AND periodically (by audit)
 - Audit violations MUST be fixed before session ends
 
+### Screenshot Enforcement (web/mobile projects — NON-NEGOTIABLE)
+- Every UI feature MUST have screenshots in `{screenshots_dir}/{scope}-feature-{id}-*.png`
+- `{screenshots_dir}` is determined by project structure: `{pwd}/frontend/e2e/screenshots` for monorepos, `{pwd}/e2e/screenshots` for standalone frontends. Auto-detect by finding `playwright.config.ts`.
+- The parent agent MUST check for screenshots after EVERY subagent that implements a UI feature
+- If screenshots are missing, the parent MUST launch a follow-up subagent — the feature is NOT done
+- Screenshots are the primary evidence of UI quality — without them, visual bugs go undetected
+- The subagent prompt template includes inlined screenshot instructions so subagents know what to do without needing to find external docs
+
 ### Autonomous Operation (NON-NEGOTIABLE)
 - NEVER stop to ask the human a question
 - NEVER wait for human approval
diff --git a/references/core/init-script-template.md b/references/core/init-script-template.md
index 359d7d9..18ecbb8 100644
--- a/references/core/init-script-template.md
+++ b/references/core/init-script-template.md
@@ -28,10 +28,14 @@ pkill -f 'node.*dev' 2>/dev/null || true
 sleep 1
 
 # 2. Delete old screenshots for fresh test results
+# Note: screenshot dir is e2e/screenshots/ relative to playwright.config.ts
+# For monorepo (frontend/ subdir): clean frontend/e2e/screenshots/
+# For standalone frontend (root): clean e2e/screenshots/
 echo "Cleaning old test artifacts..."
-rm -rf e2e/screenshots/*.png 2>/dev/null || true
+SCREENSHOT_DIR="e2e/screenshots"  # adjust to "frontend/e2e/screenshots" for monorepos
+rm -rf "$SCREENSHOT_DIR"/*.png 2>/dev/null || true
 rm -rf test-results 2>/dev/null || true
-mkdir -p e2e/screenshots
+mkdir -p "$SCREENSHOT_DIR"
 
 # 3. Install/update dependencies
 echo "Installing dependencies..."
diff --git a/references/verification/web-verification.md b/references/verification/web-verification.md
index f35bd3d..0c147ff 100644
--- a/references/verification/web-verification.md
+++ b/references/verification/web-verification.md
@@ -33,7 +33,9 @@ If not running, start them with `bash init.sh`.
 
 ### Step 2: Write E2E Tests with Screenshots
 
-Every test MUST capture screenshots at key user journey points:
+Every test MUST capture screenshots at key user journey points.
+
+**Screenshot directory:** Screenshots are stored in `e2e/screenshots/` relative to the directory containing `playwright.config.ts`. In a monorepo with `frontend/`, this is `frontend/e2e/screenshots/`. In a standalone frontend project, this is `e2e/screenshots/` at the project root. The parent agent resolves this to an absolute path and passes it as `{screenshots_dir}` in the subagent prompt.
 
 ```typescript
 import { test, expect } from '@playwright/test';
@@ -42,6 +44,7 @@ test('user can login', async ({ page }) => {
   await page.goto('/login');
 
   // Screenshot: Initial state
+  // Path is relative to the Playwright project root (where playwright.config.ts lives)
   await page.screenshot({
     path: `e2e/screenshots/${scope}-feature-${id}-step1-login-initial.png`,
     fullPage: true
@@ -143,6 +146,7 @@ export default defineConfig({
 ## Parent Agent Post-Verification
 
 After subagent completes, parent MUST:
-1. Confirm screenshots exist: `ls e2e/screenshots/{scope}-feature-{id}-*.png 2>/dev/null | wc -l`
+1. Confirm screenshots exist: `ls {screenshots_dir}/{scope}-feature-{id}-*.png 2>/dev/null | wc -l`
+   (`{screenshots_dir}` = absolute path to `e2e/screenshots/` relative to `playwright.config.ts`)
 2. Spot-check one screenshot with the Read tool
 3. If quality is poor, launch a polish subagent
diff --git a/references/web/e2e-verification.md b/references/web/e2e-verification.md
index 253a53e..b597f82 100644
--- a/references/web/e2e-verification.md
+++ b/references/web/e2e-verification.md
@@ -4,6 +4,19 @@ Verify features work correctly using Playwright E2E tests with screenshot captur
 
 **CRITICAL: Screenshots are MANDATORY for every feature. They are the primary evidence of correct implementation and UI quality. Skipping screenshots means the feature is NOT verified.**
 
+## Screenshot Directory
+
+Screenshots are stored in `e2e/screenshots/` **relative to the directory containing `playwright.config.ts`**. This varies by project structure:
+
+| Project Structure | `playwright.config.ts` location | Screenshot directory |
+|---|---|---|
+| Monorepo (`frontend/` subdir) | `frontend/playwright.config.ts` | `frontend/e2e/screenshots/` |
+| Standalone frontend (root) | `playwright.config.ts` | `e2e/screenshots/` |
+
+In Playwright test code, always use the **relative** path `e2e/screenshots/...` — Playwright resolves it from its config directory.
+
+For parent agent verification, resolve to an **absolute** path: find `playwright.config.ts`, then append `e2e/screenshots/`.
+
 ## Prerequisites
 
 Ensure Playwright is set up:
@@ -26,7 +39,10 @@ If not running, start them with `bash init.sh`.
 
 ### Step 2: Clear Old Screenshots
 
+The screenshot directory is `e2e/screenshots/` relative to the directory containing `playwright.config.ts`. In a monorepo (e.g., `frontend/`), run commands from that subdirectory. In a standalone project, run from the project root.
+
 ```bash
+# Run from the directory containing playwright.config.ts
 rm -rf e2e/screenshots/*.png e2e/screenshots/**/*.png 2>/dev/null || true
 rm -rf test-results/**/*.png 2>/dev/null || true
 ```

From 87f1fdffd4cd0d2e003fad0586108fdc5bc8652b Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Wed, 18 Mar 2026 07:39:11 +0800
Subject: [PATCH 06/17] docs: enforce self-contained features with embedded
 test steps

Features must include their own test/verification steps instead of
deferring testing to separate features. This prevents false progress
where features appear "done" but are unverified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 SKILL.md                               |  3 +
 references/core/constitution-audit.md  |  3 +-
 references/core/feature-list-format.md | 84 +++++++++++++++++++-------
 3 files changed, 66 insertions(+), 24 deletions(-)

diff --git a/SKILL.md b/SKILL.md
index de33c7d..7b36503 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -150,6 +150,9 @@ project-root/
 
    **Important:** Include the `"type"` field in feature_list.json (see feature-list-format.md).
 
+   **CRITICAL — Self-Contained Features (NON-NEGOTIABLE):**
+   Every feature MUST include its own test and verification steps. NEVER create separate "testing" or "verification" features (e.g., "Write integration tests", "Add E2E tests for all pages"). Each feature's `steps` array must contain both implementation AND verification steps so the feature can be independently verified when completed. See `references/core/feature-list-format.md` for the "Self-Contained Features" rule and examples.
+
 6. **Create/update init.sh** — see `references/core/init-script-template.md`
 
 7. **Commit and update progress log**
diff --git a/references/core/constitution-audit.md b/references/core/constitution-audit.md
index 7321349..9694645 100644
--- a/references/core/constitution-audit.md
+++ b/references/core/constitution-audit.md
@@ -88,9 +88,10 @@ Output format:
 
 Group related violations into features. Each feature should:
 - Fix ONE specific pattern or concern (not mix unrelated changes)
-- Have concrete, verifiable test steps
+- Have concrete, verifiable test steps **included in the feature itself** (NOT as separate testing features)
 - Include the exact constitution rule being addressed
 - Be ordered: dependencies first (e.g., fix types before fixing code that uses those types)
+- **NEVER create standalone "testing" or "verification" features** — each feature's `steps` must include both the fix AND the tests that verify it. A feature is not done until it is verified within its own steps.
 
 ### Key Principles
 
diff --git a/references/core/feature-list-format.md b/references/core/feature-list-format.md
index ecfa422..1e242d5 100644
--- a/references/core/feature-list-format.md
+++ b/references/core/feature-list-format.md
@@ -70,10 +70,38 @@ You may use any category that makes sense for the project.
 - Remove or edit test steps
 - Weaken or delete tests
 - Change a passing feature back to failing (unless genuine regression)
+- **Create separate "testing" or "verification" features** — testing and verification MUST be embedded as steps within the feature they verify (see Self-Contained Features rule below)
 
 **ONLY:**
 - Change `"passes": false` to `"passes": true` after thorough verification
 
+## Self-Contained Features (NON-NEGOTIABLE)
+
+Every feature MUST be independently verifiable. This means:
+
+1. **Each feature includes its own test/verification steps** — the `steps` array MUST contain steps that implement the feature AND steps that verify it (run tests, check types, validate behavior)
+2. **NO separate "testing" or "verification" features** — never create features like "Write integration tests for X" or "Add E2E tests for all pages" as standalone features. Tests are part of the feature they test.
+3. **NO deferred testing** — do not push testing to the end of the feature list. When a feature is marked `"passes": true`, it means the feature is implemented AND tested AND verified.
+4. **A feature is not done until it is verified** — the subagent implementing each feature runs the verification strategy for the project type (see `references/verification/`) as part of that feature's implementation.
+
+**Why:** When testing is a separate feature at the end, it creates a false sense of progress — features appear "done" but are unverified. It also makes the test-writing disconnected from the implementation context. Each feature must stand on its own: implemented, tested, and verified before moving on.
+
+**Anti-pattern (WRONG):**
+```json
+{"id": 5, "description": "Product CRUD backend service", "steps": ["Implement create", "Implement list", "Implement update", "Implement delete"]},
+{"id": 13, "description": "Backend integration tests for all services", "steps": ["Write tests for categories", "Write tests for products", "Run full suite"]}
+```
+
+**Correct pattern:**
+```json
+{"id": 5, "description": "Product CRUD backend service", "steps": [
+  "Implement ProductService with Create, List, GetByID, Update, Delete",
+  "Write integration tests: create product, verify response matches fixture",
+  "Write integration tests: list with pagination, filter by category/status",
+  "Write integration tests: update product, delete product, duplicate SKU rejection",
+  "Run go test -v -race ./tests/ and verify all pass"
+]}
+
 ## Priority Order
 
 Work on features in this order:
@@ -120,7 +148,9 @@ Every feature's test steps should be concrete and verifiable. The steps depend o
 
 ## Examples
 
-### Web Project
+Note: Every example below shows features that are **self-contained** — each feature includes implementation AND test/verification steps. There are no separate "write tests" features.
+
+### Web Project (Full-Stack)
 ```json
 {
   "type": "web",
@@ -129,14 +159,18 @@ Every feature's test steps should be concrete and verifiable. The steps depend o
       "id": 1,
       "category": "functional",
       "priority": "high",
-      "description": "User can register with email and password",
+      "description": "User registration with email and password",
       "steps": [
-        "Step 1: Navigate to /register",
-        "Step 2: Verify registration form loads with proper layout",
-        "Step 3: Submit empty form and verify inline validation errors",
-        "Step 4: Fill in email and password fields",
-        "Step 5: Click Register and verify loading state on button",
-        "Step 6: Verify redirect to dashboard"
+        "Implement registration API endpoint (POST /api/register)",
+        "Write backend integration test: valid registration returns 201 with user object",
+        "Write backend integration test: duplicate email returns 409",
+        "Write backend integration test: missing fields return 400 with validation errors",
+        "Run go test -v -race ./tests/ and verify backend tests pass",
+        "Implement registration form UI with React Hook Form + Zod validation",
+        "Handle loading, error, and success states in the form",
+        "Write E2E test: navigate to /register, submit empty form, verify inline validation errors",
+        "Write E2E test: fill valid data, submit, verify redirect to dashboard",
+        "Run pnpm tsc --noEmit and pnpm test, verify all pass"
       ],
       "passes": false
     }
@@ -155,11 +189,12 @@ Every feature's test steps should be concrete and verifiable. The steps depend o
       "priority": "high",
       "description": "Create product endpoint",
       "steps": [
-        "Step 1: POST /api/products with valid body returns 201",
-        "Step 2: Response contains id, name, price, created_at",
-        "Step 3: POST with missing required field returns 400 with field error",
-        "Step 4: POST with invalid price returns 400 with validation error",
-        "Step 5: GET /api/products/{id} returns the created product"
+        "Implement POST /api/products handler with validation",
+        "Write integration test: POST with valid body returns 201 with id, name, price, created_at",
+        "Write integration test: POST with missing required field returns 400 with field error",
+        "Write integration test: POST with invalid price returns 400 with validation error",
+        "Write integration test: GET /api/products/{id} returns the created product",
+        "Run go test -v -race ./tests/ and verify all pass"
       ],
       "passes": false
     }
@@ -178,11 +213,12 @@ Every feature's test steps should be concrete and verifiable. The steps depend o
       "priority": "high",
       "description": "Init command creates project structure",
       "steps": [
-        "Step 1: Run `mytool init myproject` in empty directory",
-        "Step 2: Verify directory structure created (src/, tests/, config/)",
-        "Step 3: Verify config file has correct defaults",
-        "Step 4: Run `mytool init` without name and verify error message",
-        "Step 5: Run `mytool init myproject` again and verify idempotent behavior"
+        "Implement init command with directory creation and config generation",
+        "Write test: `mytool init myproject` in empty directory creates src/, tests/, config/",
+        "Write test: verify config file has correct defaults",
+        "Write test: `mytool init` without name shows error message with usage hint",
+        "Write test: `mytool init myproject` again is idempotent (no error, no overwrite)",
+        "Run all tests and verify they pass"
       ],
       "passes": false
     }
@@ -201,11 +237,13 @@ Every feature's test steps should be concrete and verifiable. The steps depend o
       "priority": "high",
       "description": "Parse function handles all input formats",
       "steps": [
-        "Step 1: parse('simple string') returns correct AST node",
-        "Step 2: parse('nested {value}') handles interpolation",
-        "Step 3: parse('') returns descriptive error",
-        "Step 4: parse(null) returns descriptive error without panic",
-        "Step 5: Verify Parse is exported in public API"
+        "Implement parse() function for string, interpolation, and edge case inputs",
+        "Write unit test: parse('simple string') returns correct AST node",
+        "Write unit test: parse('nested {value}') handles interpolation",
+        "Write unit test: parse('') returns descriptive error",
+        "Write unit test: parse(null) returns descriptive error without panic",
+        "Verify Parse is exported in public API",
+        "Run all tests and verify they pass"
       ],
       "passes": false
     }

From f0aeed5ae3adcaff6323dd26eeea27359171bdd2 Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Wed, 18 Mar 2026 09:21:26 +0800
Subject: [PATCH 07/17] docs: require screenshot capture and visual review
 steps for UI features

Web/mobile UI features must now include explicit screenshot capture,
Playwright verification, and visual review steps. This ensures visual
quality is verified within each feature rather than deferred to a
separate gate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 SKILL.md                               | 13 ++++++++
 references/core/feature-list-format.md | 44 +++++++++++++++++++++++++-
 2 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/SKILL.md b/SKILL.md
index 7b36503..39440e0 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -153,6 +153,19 @@ project-root/
    **CRITICAL — Self-Contained Features (NON-NEGOTIABLE):**
    Every feature MUST include its own test and verification steps. NEVER create separate "testing" or "verification" features (e.g., "Write integration tests", "Add E2E tests for all pages"). Each feature's `steps` array must contain both implementation AND verification steps so the feature can be independently verified when completed. See `references/core/feature-list-format.md` for the "Self-Contained Features" rule and examples.
 
+   **CRITICAL — Screenshot & Visual Review Steps for UI Features (web/mobile — NON-NEGOTIABLE):**
+   For `web` and `mobile` project types, every feature that produces or modifies UI MUST include **screenshot capture and visual review** steps in its `steps` array. These are NOT optional and MUST NOT be deferred to a separate feature. A UI feature without screenshot steps will be implemented without visual verification, which defeats the purpose of the screenshot gate.
+
+   Every UI feature's `steps` array MUST end with these steps (adapted to the feature):
+   ```
+   "Capture screenshots: write/update Playwright test that takes fullPage screenshots at key states (list view, empty state, form, after action)",
+   "Run Playwright tests and verify screenshots are generated in e2e/screenshots/",
+   "Visually review each screenshot: verify layout, spacing, hierarchy, loading/empty/error states, data display, and overall polish",
+   "Fix any visual issues found and re-capture until quality is acceptable"
+   ```
+
+   **How to determine if a feature is a UI feature:** If the feature creates or modifies files in `src/routes/`, `src/components/`, `src/features/`, or any file that renders user-visible HTML/JSX, it is a UI feature and MUST have screenshot steps. Backend-only features (services, models, API endpoints without frontend) do NOT need screenshot steps.
+
 6. **Create/update init.sh** — see `references/core/init-script-template.md`
 
 7. **Commit and update progress log**
diff --git a/references/core/feature-list-format.md b/references/core/feature-list-format.md
index 1e242d5..c4365bf 100644
--- a/references/core/feature-list-format.md
+++ b/references/core/feature-list-format.md
@@ -86,6 +86,44 @@ Every feature MUST be independently verifiable. This means:
 
 **Why:** When testing is a separate feature at the end, it creates a false sense of progress — features appear "done" but are unverified. It also makes the test-writing disconnected from the implementation context. Each feature must stand on its own: implemented, tested, and verified before moving on.
 
+## Screenshot & Visual Review Steps (web/mobile — NON-NEGOTIABLE)
+
+For `web` and `mobile` project types, every feature that produces or modifies UI MUST include **screenshot capture and visual review** as explicit steps in its `steps` array. Without these steps, the subagent will implement the UI but skip visual verification — and the parent agent's screenshot gate becomes the only safety net (which is too late and easy to miss).
+
+**Rule:** If a feature creates or modifies any file that renders user-visible HTML/JSX (routes, components, pages, layouts), it is a UI feature and its `steps` MUST include:
+
+1. A step to **capture screenshots** via Playwright at key states (list view, empty state, form, after action, error state)
+2. A step to **run Playwright tests** and verify screenshots are generated
+3. A step to **visually review** each screenshot for layout, spacing, hierarchy, states, and polish
+4. A step to **fix visual issues** and re-capture until acceptable
+
+**Anti-pattern (WRONG) — UI feature without screenshot steps:**
+```json
+{"id": 9, "description": "Category management pages", "steps": [
+  "Create category list page with data table",
+  "Create category form with validation",
+  "Write E2E test: create category, verify it appears",
+  "Run pnpm test — all pass"
+]}
+```
+
+**Correct pattern — UI feature WITH screenshot steps:**
+```json
+{"id": 9, "description": "Category management pages", "steps": [
+  "Create category list page with data table, empty state, loading skeleton",
+  "Create category form with React Hook Form + Zod validation",
+  "Write E2E test: seed data via API, verify list displays seeded data",
+  "Write E2E test: create category via form, verify it appears in list",
+  "Run pnpm tsc --noEmit and pnpm test — all pass",
+  "Capture screenshots: list with data, empty state, create form, edit form, delete confirmation",
+  "Run Playwright screenshot tests and verify PNGs are generated in e2e/screenshots/",
+  "Visually review each screenshot: layout, spacing, hierarchy, loading/empty/error states, polish",
+  "Fix any visual issues found in screenshots and re-capture until quality is acceptable"
+]}
+```
+
+**Backend-only features** (services, models, API endpoints, migrations) do NOT need screenshot steps.
+
 **Anti-pattern (WRONG):**
 ```json
 {"id": 5, "description": "Product CRUD backend service", "steps": ["Implement create", "Implement list", "Implement update", "Implement delete"]},
@@ -170,7 +208,11 @@ Note: Every example below shows features that are **self-contained** — each fe
         "Handle loading, error, and success states in the form",
         "Write E2E test: navigate to /register, submit empty form, verify inline validation errors",
         "Write E2E test: fill valid data, submit, verify redirect to dashboard",
-        "Run pnpm tsc --noEmit and pnpm test, verify all pass"
+        "Run pnpm tsc --noEmit and pnpm test, verify all pass",
+        "Capture screenshots: registration form empty, form with validation errors, form submitting (loading), successful redirect to dashboard",
+        "Run Playwright screenshot tests, verify PNGs generated in e2e/screenshots/",
+        "Visually review each screenshot: layout, spacing, form field alignment, error message styling, loading state, overall polish",
+        "Fix any visual issues found and re-capture until quality is acceptable"
       ],
       "passes": false
     }

From 69a33a6169949dd3f8d7509e0be4410cd82f0458 Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Wed, 18 Mar 2026 15:04:51 +0800
Subject: [PATCH 08/17] fix: add full-stack integration smoke test to prevent
 CORS and route prefix bugs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When a web project connects its frontend to a real backend API, TypeScript
compilation alone cannot catch two common infrastructure issues:

1. Route prefix mismatch — code generators (ogen, openapi-generator) register
   routes without the OpenAPI servers.url prefix, so the backend serves at
   /products instead of /api/v1/products
2. Missing CORS headers — browsers silently block cross-origin requests,
   causing the frontend to show loading spinners forever

Changes:
- web-verification.md: Added "Full-Stack Integration Smoke Test" section with
  step-by-step process, fail-fast criteria, and common root causes table
- SKILL.md: Added integration smoke test gate to parent agent post-verification,
  added conditional instructions to subagent prompt for API-connecting features,
  added 3 new entries to Decision Making Guidelines table
- init-script-template.md: Added CORS and route prefix verification steps to
  the web project template with warning messages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 SKILL.md                                    | 37 ++++++++++
 references/core/init-script-template.md     | 31 +++++++++
 references/verification/web-verification.md | 76 +++++++++++++++++++++
 3 files changed, 144 insertions(+)

diff --git a/SKILL.md b/SKILL.md
index 39440e0..a27deeb 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -340,6 +340,16 @@ Follow {skill_base_dir}/references/core/gitignore-standards.md:
 - SCREENSHOTS ARE NON-NEGOTIABLE — do not skip or defer them
 - If the app/server is not running for screenshots, start it (check init.sh or start manually)
 {END IF}
+{IF feature connects frontend to real backend API (replaces mocks, changes fetch config):}
+### Full-Stack Integration Verification (NON-NEGOTIABLE)
+This feature connects the frontend to a real backend. You MUST verify the connection works end-to-end:
+1. **Start both servers** — backend with a real database, frontend with VITE_API_BASE_URL pointing to backend
+2. **Verify route prefix** — `curl` the backend API at the URL the frontend will use (e.g., `/api/v1/...`). If 404, the route prefix is wrong. Code generators often omit the OpenAPI `servers.url` prefix — mount the handler under the correct prefix.
+3. **Verify CORS** — `curl -I -X OPTIONS` with an `Origin` header matching the frontend port. If no `Access-Control-Allow-Origin` header, add CORS middleware. This is the #1 reason frontends silently fail to load data.
+4. **Seed data and screenshot** — Seed 2-3 records, take Playwright screenshots of all pages, and verify they show REAL DATA (not loading skeletons or empty states).
+5. **Check browser console** — Run Playwright with console error capture. Any CORS or fetch errors mean the integration is broken.
+Do NOT mark this feature as passing based only on `tsc --noEmit`. TypeScript cannot catch CORS or route mismatches.
+{END IF}
 ```
 
 **How to launch the subagent:**
@@ -392,6 +402,30 @@ The subagent handles implementation, testing, verification, and committing. The
       - Polished appearance, not prototype-level?
       - If quality is poor, launch a **polish subagent** to fix UI issues and recapture.
 
+   **For `web` full-stack projects — INTEGRATION SMOKE TEST GATE (NON-NEGOTIABLE):**
+
+   This gate MUST be executed for ANY feature that connects the frontend to a real backend API (replacing mocks, changing fetch config, modifying backend routes/middleware). This is the **#1 source of silent failures** — TypeScript compiles clean but the app shows loading spinners forever because of CORS or route prefix issues.
+
+   After the subagent commits, the parent agent MUST:
+
+   a. **Start both servers** (backend with real database, frontend pointing to backend)
+   b. **Verify backend routes respond** (not 404):
+      ```bash
+      curl -s http://localhost:{backend_port}/api/v1/{any_resource} | head -3
+      ```
+      If 404: route prefix mismatch. Code generators (ogen, openapi-generator) often register routes without the OpenAPI `servers.url` prefix. Fix by mounting the generated handler under `/api/v1` with `http.StripPrefix` or equivalent.
+   c. **Verify CORS headers**:
+      ```bash
+      curl -s -I -X OPTIONS http://localhost:{backend_port}/api/v1/{any_resource} \
+        -H 'Origin: http://localhost:{frontend_port}' | grep -i 'access-control'
+      ```
+      If missing: add CORS middleware to the backend. Without it, browsers silently block all frontend API requests.
+   d. **Seed test data** via API (at least 2-3 records)
+   e. **Run Playwright screenshots** against all major pages
+   f. **Verify screenshots show REAL DATA** — not loading skeletons, not empty states. If data is missing, diagnose using the common root causes table in `references/verification/web-verification.md`.
+
+   If any check fails, launch a fix subagent before moving to the next feature.
+
    **For `api` projects:**
    - Verify integration tests exist and pass
    - Check that error cases are tested (not just happy paths)
@@ -479,6 +513,9 @@ Since the human may be asleep, follow these rules for autonomous decisions:
 | **Web/mobile:** Unclear UI design | Follow references/web/frontend-design.md |
 | **Web/mobile:** UI looks generic/plain | Add visual polish per references/web/ux-standards.md |
 | **Web/mobile:** Subagent skipped screenshots | Launch follow-up subagent to add them |
+| **Web full-stack:** Frontend shows loading forever | Check CORS headers and route prefix — see `references/verification/web-verification.md` Integration Smoke Test |
+| **Web full-stack:** curl works but browser doesn't | CORS issue — add `Access-Control-Allow-Origin` middleware to backend |
+| **Web full-stack:** Backend returns 404 for /api/v1/... | Code generator omitted server URL prefix — mount handler under `/api/v1` |
 | **API:** Unclear response format | Follow existing endpoint patterns, use consistent error format |
 | **CLI:** Unclear output format | Match existing command output style |
 | **Library:** Unclear public API | Keep it minimal, expose only what's needed |
diff --git a/references/core/init-script-template.md b/references/core/init-script-template.md
index 18ecbb8..fcb22a1 100644
--- a/references/core/init-script-template.md
+++ b/references/core/init-script-template.md
@@ -62,6 +62,18 @@ cd ..
 
 # 8. Wait and verify
 sleep 3
+
+# 9. Verify backend API and CORS (for full-stack projects)
+echo "Verifying backend API..."
+API_RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8082/api/v1/ 2>/dev/null || echo "000")
+if [ "$API_RESPONSE" = "404" ]; then
+  echo "⚠️  WARNING: Backend returns 404 for /api/v1/ — route prefix may be misconfigured"
+fi
+CORS_HEADER=$(curl -s -I -X OPTIONS http://localhost:8082/api/v1/ -H 'Origin: http://localhost:3000' 2>/dev/null | grep -i 'access-control-allow-origin' || echo "")
+if [ -z "$CORS_HEADER" ]; then
+  echo "⚠️  WARNING: No CORS headers detected — frontend requests will be blocked by browser"
+fi
+
 echo ""
 echo "Active scope: $(cat .active-scope 2>/dev/null || echo 'none')"
 ```
@@ -249,3 +261,22 @@ curl -s http://localhost:8080/health || echo "Server not responding"
 # Verify tool builds
 ./bin/mytool --version 2>/dev/null || echo "CLI not built"
 ```
+
+## Web Project: CORS and Route Prefix Verification (IMPORTANT)
+
+For full-stack web projects where the frontend and backend run on different ports, **always verify CORS and route prefixes** after starting services. These are the #1 and #2 most common causes of "frontend can't load data" bugs.
+
+```bash
+# 1. Verify backend API responds (not 404)
+# If using an API prefix like /api/v1, test the full path:
+curl -s http://localhost:8080/api/v1/health || curl -s http://localhost:8080/api/v1/<any-list-endpoint> | head -3
+# If 404: the backend route registration doesn't include the prefix.
+# Common with code generators (ogen, openapi-generator) that register routes
+# without the OpenAPI servers.url prefix. Fix by mounting under the prefix.
+
+# 2. Verify CORS headers are set
+curl -s -I -X OPTIONS http://localhost:8080/api/v1/<any-endpoint> \
+  -H 'Origin: http://localhost:3000' | grep -i 'access-control'
+# If no Access-Control-Allow-Origin header: add CORS middleware to the backend.
+# Without CORS headers, browsers block all requests from the frontend.
+```
diff --git a/references/verification/web-verification.md b/references/verification/web-verification.md
index 0c147ff..605047b 100644
--- a/references/verification/web-verification.md
+++ b/references/verification/web-verification.md
@@ -143,6 +143,81 @@ export default defineConfig({
 });
 ```
 
+## Full-Stack Integration Smoke Test (NON-NEGOTIABLE for web projects with backend)
+
+When a feature connects the frontend to a real backend API (e.g., replacing mock data with real API calls), a **live integration smoke test** MUST be performed. This catches issues that TypeScript compilation alone cannot detect — CORS, route prefix mismatches, response envelope mismatches, and authentication failures.
+
+### When to Run
+
+Run this smoke test for ANY feature that:
+- Replaces mock/stub data with real API calls
+- Changes the API base URL, fetch wrapper, or custom client
+- Modifies backend route registration or middleware
+- Is the first feature to connect a previously-mocked frontend to the real backend
+
+### Process
+
+**Step 1: Start both servers**
+
+```bash
+# Start backend (with real database)
+cd backend && DATABASE_URL="..." go run ./cmd/api/ &
+sleep 3
+
+# Start frontend (pointing to backend)
+cd frontend && VITE_API_BASE_URL=http://localhost:8080 pnpm dev &
+sleep 3
+```
+
+**Step 2: Verify backend responds to API calls**
+
+```bash
+# Test a list endpoint directly (bypasses CORS — this tests the backend alone)
+curl -s http://localhost:8080/api/v1/<resource> | head -5
+```
+
+If this returns 404, the route prefix is wrong (common with code generators like ogen that don't include the OpenAPI `servers.url` prefix in generated routes). Fix by mounting the generated server under the correct prefix (e.g., `http.StripPrefix("/api/v1", server)`).
+
+**Step 3: Verify CORS headers**
+
+```bash
+curl -s -I -X OPTIONS http://localhost:8080/api/v1/<resource> \
+  -H 'Origin: http://localhost:5173' | grep -i 'access-control'
+```
+
+If no `Access-Control-Allow-Origin` header is present, the browser will block all frontend requests. Add CORS middleware to the backend. This is the **#1 most common cause** of "frontend shows loading forever" bugs in full-stack web projects.
+
+**Step 4: Seed test data and take screenshots**
+
+```bash
+# Seed at least 2-3 records via API
+curl -X POST http://localhost:8080/api/v1/<resource> -H 'Content-Type: application/json' -d '...'
+```
+
+Then run Playwright screenshot tests against all major pages and **visually verify** that:
+- Pages show **real data** (not loading skeletons or empty states)
+- Data matches what was seeded (correct names, counts, values)
+- No console errors in the browser (especially CORS or fetch failures)
+
+**Step 5: Fail-fast criteria**
+
+The integration smoke test FAILS if any of these are true:
+- Backend returns 404 for known API endpoints → route prefix mismatch
+- CORS headers are missing → add CORS middleware
+- Screenshots show loading skeletons that never resolve → API calls failing silently
+- Screenshots show empty states despite seeded data → response envelope mismatch
+- Browser console shows fetch/network errors → connectivity or CORS issue
+
+### Common Root Causes
+
+| Symptom | Root Cause | Fix |
+|---------|-----------|-----|
+| Backend returns 404 for /api/v1/... | Code generator (ogen, openapi-generator) registers routes without server URL prefix | Mount generated handler under `/api/v1` with `http.StripPrefix` or equivalent |
+| Frontend shows loading forever | CORS: browser blocks cross-origin requests | Add CORS middleware (`Access-Control-Allow-Origin: *` for dev) |
+| Frontend shows empty despite seeded data | Response envelope mismatch: frontend expects `{ data: ... }` but backend returns flat response, or vice versa | Align envelope handling in fetch wrapper or backend |
+| API works via curl but not from browser | CORS (curl bypasses CORS, browsers enforce it) | Add CORS middleware |
+| OPTIONS requests return 404 | Backend doesn't handle preflight requests | CORS middleware must handle OPTIONS with 204 No Content |
+
 ## Parent Agent Post-Verification
 
 After subagent completes, parent MUST:
@@ -150,3 +225,4 @@ After subagent completes, parent MUST:
    (`{screenshots_dir}` = absolute path to `e2e/screenshots/` relative to `playwright.config.ts`)
 2. Spot-check one screenshot with the Read tool
 3. If quality is poor, launch a polish subagent
+4. **For full-stack features**: verify screenshots show **real data**, not loading skeletons or empty states. If data is missing, run the integration smoke test above to diagnose.

From 5317f71742f6380c7fece19e043485b3bed9b9f2 Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Wed, 18 Mar 2026 15:48:46 +0800
Subject: [PATCH 09/17] =?UTF-8?q?fix:=20resolve=20universality=20issues=20?=
 =?UTF-8?q?=E2=80=94=20make=20skill=20technology-agnostic=20and=20outcome-?=
 =?UTF-8?q?oriented?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- e2e-verification.md: reframe screenshots as secondary to interaction tests,
  add subagent reference note distinguishing it from web-verification.md
- init-script-template.md: generalize CORS/route-prefix section to
  "cross-component connectivity" with example framing instead of hardcoded paths
- ux-standards.md: replace Tailwind-specific classes (text-2xl, shadow-sm, p-1,
  etc.) with technology-agnostic descriptions
- code-quality.md: replace data-testid assumption with project-type-appropriate
  stable test selectors
- SKILL.md: project type detection now asks about user interaction (browser,
  terminal, import/call) instead of specific frameworks; init-scope screenshot
  emphasis reframed with outcome-proving tests as primary verification
- feature-list-format.md: add outcome-oriented features rule, rewrite
  verification steps to prove user outcomes not component existence
- web-verification.md: add interaction test emphasis as primary verification,
  add correct/wrong test examples, reorder parent post-verification checks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 SKILL.md                                    |  38 +--
 references/core/code-quality.md             |   2 +-
 references/core/feature-list-format.md      | 262 +++++++++++++-------
 references/core/init-script-template.md     |  31 ++-
 references/verification/web-verification.md |  69 +++++-
 references/web/e2e-verification.md          |   6 +-
 references/web/ux-standards.md              |  38 +--
 7 files changed, 290 insertions(+), 156 deletions(-)

diff --git a/SKILL.md b/SKILL.md
index a27deeb..dd8c91f 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -126,13 +126,9 @@ project-root/
    ln -sf specs/auth/progress.txt progress.txt
    ```
 
-4. **Determine project type** — detect or ask:
-   - Look at the codebase: Does it have a `src/` with React/Vue/Svelte? → `web`
-   - Is it a REST/GraphQL API with no frontend? → `api`
-   - Does it have a `main` with CLI arg parsing (cobra, clap, argparse, commander)? → `cli`
-   - Is it a package/module with no main entry point? → `library`
-   - Does it have ETL/pipeline code (pandas, spark, dbt, airflow)? → `data`
-   - Does it have React Native, Flutter, SwiftUI, or Kotlin/Android? → `mobile`
+4. **Determine project type** — based on how users interact with the deliverable:
+   - What does the user interact with? **Browser** → `web`. **Terminal** → `cli`. **Import/call** → `library`. **HTTP requests** → `api`. **Phone/tablet** → `mobile`. **Data outputs** → `data`.
+   - Confirm by examining the codebase structure (e.g., frontend frameworks suggest `web`, CLI entry points suggest `cli`, no main entry point suggests `library`)
    - If unclear, default to the most fitting type based on spec.md
 
 5. **Create feature list** — choose the right method:
@@ -153,15 +149,18 @@ project-root/
    **CRITICAL — Self-Contained Features (NON-NEGOTIABLE):**
    Every feature MUST include its own test and verification steps. NEVER create separate "testing" or "verification" features (e.g., "Write integration tests", "Add E2E tests for all pages"). Each feature's `steps` array must contain both implementation AND verification steps so the feature can be independently verified when completed. See `references/core/feature-list-format.md` for the "Self-Contained Features" rule and examples.
 
-   **CRITICAL — Screenshot & Visual Review Steps for UI Features (web/mobile — NON-NEGOTIABLE):**
-   For `web` and `mobile` project types, every feature that produces or modifies UI MUST include **screenshot capture and visual review** steps in its `steps` array. These are NOT optional and MUST NOT be deferred to a separate feature. A UI feature without screenshot steps will be implemented without visual verification, which defeats the purpose of the screenshot gate.
+   **CRITICAL — Verification Steps for UI Features (web/mobile — NON-NEGOTIABLE):**
+   For `web` and `mobile` project types, every feature that produces or modifies UI MUST include **interaction test and screenshot** steps in its `steps` array. These are NOT optional and MUST NOT be deferred to a separate feature.
+
+   **Outcome-proving tests (interaction, integration, unit) are the PRIMARY verification.** Tests must perform real user actions and verify observable outcomes — they prove the feature actually works. **Screenshots are SECONDARY** — they verify visual quality and catch layout/styling issues that interaction tests cannot detect. Both are required.
 
    Every UI feature's `steps` array MUST end with these steps (adapted to the feature):
    ```
-   "Capture screenshots: write/update Playwright test that takes fullPage screenshots at key states (list view, empty state, form, after action)",
-   "Run Playwright tests and verify screenshots are generated in e2e/screenshots/",
+   "Write interaction tests: Playwright tests that perform user actions (click, fill, submit, navigate) and verify outcomes (data appears, page navigates, state changes)",
+   "Capture screenshots: add fullPage screenshots at key states (list view, empty state, form, after action)",
+   "Run Playwright tests and verify all pass and screenshots are generated in e2e/screenshots/",
    "Visually review each screenshot: verify layout, spacing, hierarchy, loading/empty/error states, data display, and overall polish",
-   "Fix any visual issues found and re-capture until quality is acceptable"
+   "Fix any issues found (broken behavior or visual problems) and re-run until quality is acceptable"
    ```
 
    **How to determine if a feature is a UI feature:** If the feature creates or modifies files in `src/routes/`, `src/components/`, `src/features/`, or any file that renders user-visible HTML/JSX, it is a UI feature and MUST have screenshot steps. Backend-only features (services, models, API endpoints without frontend) do NOT need screenshot steps.
@@ -284,7 +283,7 @@ Follow {skill_base_dir}/references/verification/{type}-verification.md:
 {IF type == "web" or type == "mobile":}
 ### Phase 3b: Screenshot Capture (NON-NEGOTIABLE for web/mobile)
 
-Screenshots are MANDATORY for every UI feature. They are the primary evidence of correct implementation and UI quality. A feature without screenshots is NOT verified.
+Interaction tests (Phase 3) are the PRIMARY verification that features work. Screenshots are SECONDARY but MANDATORY — they verify visual quality and catch layout/styling issues that interaction tests cannot detect. A feature without both interaction tests and screenshots is NOT fully verified.
 
 **Screenshot directory:** `{screenshots_dir}` (provided by parent agent — this is the absolute path to where screenshots are stored, e.g., `/path/to/project/frontend/e2e/screenshots` for a monorepo or `/path/to/project/e2e/screenshots` for a standalone frontend).
 
@@ -401,6 +400,7 @@ The subagent handles implementation, testing, verification, and committing. The
       - Real data shown, not empty/broken?
       - Polished appearance, not prototype-level?
       - If quality is poor, launch a **polish subagent** to fix UI issues and recapture.
+   d. **OUTCOME VERIFICATION CHECK:** Verify the subagent wrote tests that **prove the feature works from the user's perspective** — not just screenshot-only tests. Tests must perform user actions (click, fill, submit, navigate) and verify outcomes (data appears, page navigates, state changes). If the feature has interactive elements and the only tests are screenshots, the feature is NOT verified. Launch a follow-up subagent to add interaction tests. See `references/verification/web-verification.md` Step 2.
 
    **For `web` full-stack projects — INTEGRATION SMOKE TEST GATE (NON-NEGOTIABLE):**
 
@@ -543,13 +543,13 @@ Before ending:
 - Standards are verified both during implementation (by subagent) AND periodically (by audit)
 - Audit violations MUST be fixed before session ends
 
-### Screenshot Enforcement (web/mobile projects — NON-NEGOTIABLE)
-- Every UI feature MUST have screenshots in `{screenshots_dir}/{scope}-feature-{id}-*.png`
+### Verification Enforcement (web/mobile projects — NON-NEGOTIABLE)
+- **Interaction tests proving user outcomes are PRIMARY verification** — every UI feature MUST have tests that perform user actions and verify results
+- Every UI feature MUST also have screenshots in `{screenshots_dir}/{scope}-feature-{id}-*.png` — screenshots are SECONDARY for visual quality
 - `{screenshots_dir}` is determined by project structure: `{pwd}/frontend/e2e/screenshots` for monorepos, `{pwd}/e2e/screenshots` for standalone frontends. Auto-detect by finding `playwright.config.ts`.
-- The parent agent MUST check for screenshots after EVERY subagent that implements a UI feature
-- If screenshots are missing, the parent MUST launch a follow-up subagent — the feature is NOT done
-- Screenshots are the primary evidence of UI quality — without them, visual bugs go undetected
-- The subagent prompt template includes inlined screenshot instructions so subagents know what to do without needing to find external docs
+- The parent agent MUST check for both interaction tests AND screenshots after EVERY subagent that implements a UI feature
+- If either is missing, the parent MUST launch a follow-up subagent — the feature is NOT done
+- The subagent prompt template includes inlined verification instructions so subagents know what to do without needing to find external docs
 
 ### Autonomous Operation (NON-NEGOTIABLE)
 - NEVER stop to ask the human a question
diff --git a/references/core/code-quality.md b/references/core/code-quality.md
index 1c46649..0e7986c 100644
--- a/references/core/code-quality.md
+++ b/references/core/code-quality.md
@@ -37,7 +37,7 @@ Every feature implementation must meet these standards. Code that works but is m
 - Keep functions small and single-purpose
 - Name things clearly — intent over implementation
 - Prefer composition over deep nesting
-- Use `data-testid` attributes for E2E test selectors
+- Use stable test selectors appropriate to your project type (e.g., `data-testid` for web, accessibility identifiers for mobile, named exports for libraries)
 
 ## What NOT to Do
 
diff --git a/references/core/feature-list-format.md b/references/core/feature-list-format.md
index c4365bf..ae62c32 100644
--- a/references/core/feature-list-format.md
+++ b/references/core/feature-list-format.md
@@ -75,6 +75,41 @@ You may use any category that makes sense for the project.
 **ONLY:**
 - Change `"passes": false` to `"passes": true` after thorough verification
 
+## Outcome-Oriented Features (NON-NEGOTIABLE)
+
+### The Problem This Solves
+
+The #1 cause of features that "pass" but don't work is **component-level feature definition**. When features are defined as UI components ("category list page", "category form", "delete dialog"), each component gets verified in isolation — but nobody verifies the user can actually complete the journey across components. The Edit button may exist on the list page, but if it navigates to a broken route, or the form doesn't submit, or the submission doesn't update the list, the feature is marked "passes: true" anyway because each component *looks* correct in its screenshot.
+
+### The Rule
+
+**Features MUST be defined as user outcomes, not implementation components.**
+
+Ask: "What can the user (or caller) DO when this feature is done?" — not "What UI component (or module) exists?"
+
+This applies universally to all project types:
+- **Web/Mobile:** "User can manage categories" — not "Category list page" + "Category form" + "Delete dialog"
+- **API:** "Client can manage products via REST" — not "POST endpoint" + "GET endpoint" + "PUT endpoint"
+- **CLI:** "User can initialize and configure a project" — not "Init command" + "Config file generation"
+- **Library:** "Caller can parse, transform, and serialize data" — not "Parse function" + "Transform function" + "Serialize function"
+- **Data:** "Pipeline ingests, transforms, and outputs daily reports" — not "Ingestion step" + "Transform step" + "Output step"
+
+### Why This Works
+
+When a feature is an outcome ("user can manage categories"), the verification naturally covers the full journey:
+- Can the user see the list? (list renders with data)
+- Can the user create one? (form works, submission saves, new item appears in list)
+- Can the user edit one? (edit loads existing data, changes persist)
+- Can the user delete one? (confirmation works, item removed)
+
+When a feature is a component ("category list page"), the verification only covers that component:
+- Does the page render? ✓ (but Edit button may be broken)
+- Does it look nice? ✓ (but clicking anything may fail)
+
+### Infrastructure / Scaffolding Exception
+
+Some features are genuinely infrastructure with no user-facing outcome: project setup, database migration, code generation, CI/CD configuration. These are fine as component-level features. The rule applies to features that deliver **user-facing or caller-facing functionality**.
+
 ## Self-Contained Features (NON-NEGOTIABLE)
 
 Every feature MUST be independently verifiable. This means:
@@ -86,59 +121,57 @@ Every feature MUST be independently verifiable. This means:
 
 **Why:** When testing is a separate feature at the end, it creates a false sense of progress — features appear "done" but are unverified. It also makes the test-writing disconnected from the implementation context. Each feature must stand on its own: implemented, tested, and verified before moving on.
 
-## Screenshot & Visual Review Steps (web/mobile — NON-NEGOTIABLE)
+## Verification Must Prove the Outcome (NON-NEGOTIABLE)
 
-For `web` and `mobile` project types, every feature that produces or modifies UI MUST include **screenshot capture and visual review** as explicit steps in its `steps` array. Without these steps, the subagent will implement the UI but skip visual verification — and the parent agent's screenshot gate becomes the only safety net (which is too late and easy to miss).
+This is the universal verification principle that applies to ALL project types:
 
-**Rule:** If a feature creates or modifies any file that renders user-visible HTML/JSX (routes, components, pages, layouts), it is a UI feature and its `steps` MUST include:
+**Verification must prove the user/caller can achieve the outcome described in the feature, not just that the code exists or compiles.**
 
-1. A step to **capture screenshots** via Playwright at key states (list view, empty state, form, after action, error state)
-2. A step to **run Playwright tests** and verify screenshots are generated
-3. A step to **visually review** each screenshot for layout, spacing, hierarchy, states, and polish
-4. A step to **fix visual issues** and re-capture until acceptable
+| Project Type | WRONG verification | RIGHT verification |
+|-------------|-------------------|-------------------|
+| **Web** | Screenshot of a page that renders | Playwright test: user clicks, fills, submits, and sees result |
+| **API** | Code compiles, handler function exists | Integration test: HTTP request returns correct response |
+| **CLI** | Binary builds successfully | Run the command, verify output matches expected |
+| **Library** | Types compile, function exists | Unit test: call function with input, verify output |
+| **Data** | Pipeline script has no syntax errors | Run pipeline on sample data, verify output schema and values |
+| **Mobile** | Screenshot of initial screen render | Interaction test: tap, swipe, verify navigation and state changes |
 
-**Anti-pattern (WRONG) — UI feature without screenshot steps:**
-```json
-{"id": 9, "description": "Category management pages", "steps": [
-  "Create category list page with data table",
-  "Create category form with validation",
-  "Write E2E test: create category, verify it appears",
-  "Run pnpm test — all pass"
-]}
+### How to Write Verification Steps
+
+For each feature, ask: **"If I were a user/caller, how would I prove this works?"** Then write steps that do exactly that.
+
+**Bad steps** (prove code exists):
+```
+"Create the product list component"
+"Add the edit form route"
+"Run tsc --noEmit"
+"Take a screenshot"
 ```
 
-**Correct pattern — UI feature WITH screenshot steps:**
-```json
-{"id": 9, "description": "Category management pages", "steps": [
-  "Create category list page with data table, empty state, loading skeleton",
-  "Create category form with React Hook Form + Zod validation",
-  "Write E2E test: seed data via API, verify list displays seeded data",
-  "Write E2E test: create category via form, verify it appears in list",
-  "Run pnpm tsc --noEmit and pnpm test — all pass",
-  "Capture screenshots: list with data, empty state, create form, edit form, delete confirmation",
-  "Run Playwright screenshot tests and verify PNGs are generated in e2e/screenshots/",
-  "Visually review each screenshot: layout, spacing, hierarchy, loading/empty/error states, polish",
-  "Fix any visual issues found in screenshots and re-capture until quality is acceptable"
-]}
+**Good steps** (prove outcome works):
+```
+"Seed 3 products via API, navigate to /products, verify all 3 are visible with correct names and prices"
+"Click Edit on a product, verify form loads with existing data, change the name, submit, verify the updated name appears in the list"
+"Click Delete, confirm in dialog, verify the product is removed from the list"
+"Run all tests and verify they pass"
 ```
 
-**Backend-only features** (services, models, API endpoints, migrations) do NOT need screenshot steps.
+The difference: bad steps verify the code was written. Good steps verify the feature works from the user's perspective.
 
-**Anti-pattern (WRONG):**
-```json
-{"id": 5, "description": "Product CRUD backend service", "steps": ["Implement create", "Implement list", "Implement update", "Implement delete"]},
-{"id": 13, "description": "Backend integration tests for all services", "steps": ["Write tests for categories", "Write tests for products", "Run full suite"]}
-```
+## Screenshot & Visual Review Steps (web/mobile — NON-NEGOTIABLE)
 
-**Correct pattern:**
-```json
-{"id": 5, "description": "Product CRUD backend service", "steps": [
-  "Implement ProductService with Create, List, GetByID, Update, Delete",
-  "Write integration tests: create product, verify response matches fixture",
-  "Write integration tests: list with pagination, filter by category/status",
-  "Write integration tests: update product, delete product, duplicate SKU rejection",
-  "Run go test -v -race ./tests/ and verify all pass"
-]}
+For `web` and `mobile` project types, every feature that produces or modifies UI MUST include **screenshot capture and visual review** as explicit steps in its `steps` array. Screenshots are the secondary verification layer — they catch visual/design issues that interaction tests don't (spacing, alignment, colors, polish).
+
+**Rule:** If a feature creates or modifies any file that renders user-visible HTML/JSX, its `steps` MUST include:
+
+1. A step to **capture screenshots** via Playwright at key states (after completing user flows, at empty/loading/error states)
+2. A step to **run Playwright tests** and verify screenshots are generated
+3. A step to **visually review** each screenshot for layout, spacing, hierarchy, states, and polish
+4. A step to **fix visual issues** and re-capture until acceptable
+
+**IMPORTANT:** Screenshots supplement interaction tests — they do NOT replace them. A feature that has screenshots but no interaction tests is NOT verified. A feature that has interaction tests but no screenshots is functionally verified but not visually verified. Both are required.
+
+**Backend-only features** (services, models, API endpoints, migrations) do NOT need screenshot steps.
 
 ## Priority Order
 
@@ -152,41 +185,42 @@ Work on features in this order:
 
 ### Write Verifiable Steps
 
-Every feature's test steps should be concrete and verifiable. The steps depend on project type:
+Every feature's test steps should be concrete and verifiable — they should describe **what the user/caller does and what they see/get back**, not what the developer builds.
 
 **Web projects:**
-- "Step N: Verify loading skeleton appears while data loads"
-- "Step N: Verify empty state shows icon, message, and CTA when no items exist"
-- "Step N: Verify the page renders correctly at mobile width (375px)"
+- "Seed 2 items via API, navigate to /items, verify both items visible with correct data"
+- "Click 'New Item', fill the form, submit, verify new item appears in the list"
+- "Click Edit on an item, verify form has existing data, change a field, submit, verify change persists"
+- "Delete an item, verify it's removed from the list"
 
 **API projects:**
-- "Step N: POST /api/products with valid body returns 201 and product object"
-- "Step N: POST /api/products with missing name returns 400 with field error"
-- "Step N: GET /api/products without auth returns 401"
+- "POST /api/products with valid body returns 201 and product object"
+- "POST /api/products with missing name returns 400 with field error"
+- "GET /api/products returns list including the created product"
 
 **CLI projects:**
-- "Step N: Run `mytool list --format json` and verify JSON output"
-- "Step N: Run `mytool` with no args and verify help text is shown"
-- "Step N: Run `mytool process --input missing.txt` and verify error message"
+- "Run `mytool init myproject`, verify directory structure created"
+- "Run `mytool init` without name, verify helpful error message shown"
+- "Run `mytool init myproject` twice, verify idempotent (no error)"
 
 **Library projects:**
-- "Step N: Call parse('valid input') and verify correct result"
-- "Step N: Call parse('') and verify it returns descriptive error"
-- "Step N: Verify Parse is exported from the public API"
+- "Call parse('valid input') and verify correct result"
+- "Call parse('') and verify it returns descriptive error"
+- "Verify Parse is exported in public API"
 
 **Data projects:**
-- "Step N: Run pipeline with sample input and verify output schema"
-- "Step N: Run pipeline with empty input and verify empty output (not error)"
-- "Step N: Verify aggregation totals match expected values"
+- "Run pipeline with sample input and verify output schema"
+- "Run pipeline with empty input and verify empty output (not error)"
+- "Verify aggregation totals match expected values"
 
 **Mobile projects:**
-- "Step N: Tap login button and verify navigation to dashboard"
-- "Step N: Verify loading indicator during API call"
-- "Step N: Verify layout on small screen (iPhone SE)"
+- "Tap login button, verify navigation to dashboard"
+- "Fill search field, verify results filter in real-time"
+- "Pull to refresh, verify data updates"
 
 ## Examples
 
-Note: Every example below shows features that are **self-contained** — each feature includes implementation AND test/verification steps. There are no separate "write tests" features.
+Note: Every example below defines features as **user outcomes** with verification steps that **prove the outcome works**. Features are NOT split into component-level pieces.
 
 ### Web Project (Full-Stack)
 ```json
@@ -197,22 +231,57 @@ Note: Every example below shows features that are **self-contained** — each fe
       "id": 1,
       "category": "functional",
       "priority": "high",
-      "description": "User registration with email and password",
+      "description": "Project scaffolding and shared infrastructure",
+      "steps": [
+        "Initialize frontend (React, Vite, Router, UI library) and backend (Go, framework) projects",
+        "Create shared OpenAPI spec, generate types for both sides",
+        "Create root layout with navigation",
+        "Verify frontend compiles and dev server starts",
+        "Verify backend compiles"
+      ],
+      "passes": false
+    },
+    {
+      "id": 2,
+      "category": "functional",
+      "priority": "high",
+      "description": "User can manage categories (create, view list, edit, delete)",
+      "steps": [
+        "Implement backend: category CRUD endpoints with validation and error handling",
+        "Write backend integration tests: create, list, get, update, delete, duplicate slug rejection",
+        "Run backend tests and verify all pass",
+        "Implement frontend: category list page, create form, edit form, delete confirmation",
+        "Write E2E test: seed category via API, navigate to list, verify it's visible",
+        "Write E2E test: click New, fill form, submit, verify new category in list",
+        "Write E2E test: click Edit on a category, verify form has existing data, change name, submit, verify updated name in list",
+        "Write E2E test: click Delete, confirm, verify category removed from list",
+        "Run all tests, verify all pass",
+        "Capture screenshots of list, create form, edit form, delete dialog, empty state",
+        "Visually review screenshots for layout and polish",
+        "Fix any issues and re-run until all tests pass and screenshots look good"
+      ],
+      "passes": false
+    },
+    {
+      "id": 3,
+      "category": "functional",
+      "priority": "high",
+      "description": "User can manage products (create, view list with filters, edit, delete, bulk status change)",
       "steps": [
-        "Implement registration API endpoint (POST /api/register)",
-        "Write backend integration test: valid registration returns 201 with user object",
-        "Write backend integration test: duplicate email returns 409",
-        "Write backend integration test: missing fields return 400 with validation errors",
-        "Run go test -v -race ./tests/ and verify backend tests pass",
-        "Implement registration form UI with React Hook Form + Zod validation",
-        "Handle loading, error, and success states in the form",
-        "Write E2E test: navigate to /register, submit empty form, verify inline validation errors",
-        "Write E2E test: fill valid data, submit, verify redirect to dashboard",
-        "Run pnpm tsc --noEmit and pnpm test, verify all pass",
-        "Capture screenshots: registration form empty, form with validation errors, form submitting (loading), successful redirect to dashboard",
-        "Run Playwright screenshot tests, verify PNGs generated in e2e/screenshots/",
-        "Visually review each screenshot: layout, spacing, form field alignment, error message styling, loading state, overall polish",
-        "Fix any visual issues found and re-capture until quality is acceptable"
+        "Implement backend: product CRUD + bulk status + filtering endpoints",
+        "Write backend integration tests: all CRUD ops, filters, bulk update, edge cases",
+        "Run backend tests and verify all pass",
+        "Implement frontend: product list with filters/search/pagination, create form, edit form, delete dialog, bulk actions",
+        "Write E2E test: seed products, navigate to list, verify data visible with correct prices and statuses",
+        "Write E2E test: create product with category selection, verify in list",
+        "Write E2E test: edit a product, verify changes persist",
+        "Write E2E test: delete a product, verify removed",
+        "Write E2E test: select multiple products, bulk change status, verify statuses updated",
+        "Write E2E test: filter by category, verify only matching products shown",
+        "Run all tests, verify all pass",
+        "Capture screenshots of list, filters active, bulk selection, forms, dialogs",
+        "Visually review screenshots",
+        "Fix any issues"
       ],
       "passes": false
     }
@@ -229,14 +298,16 @@ Note: Every example below shows features that are **self-contained** — each fe
       "id": 1,
       "category": "functional",
       "priority": "high",
-      "description": "Create product endpoint",
+      "description": "Client can manage products via REST API (CRUD + validation)",
       "steps": [
-        "Implement POST /api/products handler with validation",
-        "Write integration test: POST with valid body returns 201 with id, name, price, created_at",
-        "Write integration test: POST with missing required field returns 400 with field error",
-        "Write integration test: POST with invalid price returns 400 with validation error",
-        "Write integration test: GET /api/products/{id} returns the created product",
-        "Run go test -v -race ./tests/ and verify all pass"
+        "Implement all product endpoints: POST, GET list, GET by ID, PUT, DELETE",
+        "Write integration test: POST with valid body returns 201 with product",
+        "Write integration test: POST with missing required field returns 400",
+        "Write integration test: GET list returns created products with pagination",
+        "Write integration test: PUT updates product, GET returns updated data",
+        "Write integration test: DELETE removes product, GET returns 404",
+        "Write integration test: POST with duplicate SKU returns 409",
+        "Run all tests and verify they pass"
       ],
       "passes": false
     }
@@ -253,13 +324,14 @@ Note: Every example below shows features that are **self-contained** — each fe
       "id": 1,
       "category": "functional",
       "priority": "high",
-      "description": "Init command creates project structure",
+      "description": "User can initialize and configure a new project",
       "steps": [
         "Implement init command with directory creation and config generation",
-        "Write test: `mytool init myproject` in empty directory creates src/, tests/, config/",
-        "Write test: verify config file has correct defaults",
-        "Write test: `mytool init` without name shows error message with usage hint",
-        "Write test: `mytool init myproject` again is idempotent (no error, no overwrite)",
+        "Write test: `mytool init myproject` creates expected directory structure",
+        "Write test: `mytool init myproject` generates config with correct defaults",
+        "Write test: `mytool init` without name shows helpful error",
+        "Write test: running init twice is idempotent",
+        "Write test: `mytool init --template api` uses API template",
         "Run all tests and verify they pass"
       ],
       "passes": false
@@ -277,13 +349,13 @@ Note: Every example below shows features that are **self-contained** — each fe
       "id": 1,
       "category": "functional",
       "priority": "high",
-      "description": "Parse function handles all input formats",
+      "description": "Caller can parse all supported input formats into AST",
       "steps": [
-        "Implement parse() function for string, interpolation, and edge case inputs",
-        "Write unit test: parse('simple string') returns correct AST node",
-        "Write unit test: parse('nested {value}') handles interpolation",
+        "Implement parse() for strings, interpolation, and nested expressions",
+        "Write unit test: parse('simple string') returns correct AST",
+        "Write unit test: parse('Hello {name}') handles interpolation",
         "Write unit test: parse('') returns descriptive error",
-        "Write unit test: parse(null) returns descriptive error without panic",
+        "Write unit test: parse(null) returns error without panic",
         "Verify Parse is exported in public API",
         "Run all tests and verify they pass"
       ],
diff --git a/references/core/init-script-template.md b/references/core/init-script-template.md
index fcb22a1..94704fa 100644
--- a/references/core/init-script-template.md
+++ b/references/core/init-script-template.md
@@ -63,13 +63,15 @@ cd ..
 # 8. Wait and verify
 sleep 3
 
-# 9. Verify backend API and CORS (for full-stack projects)
+# 9. Verify cross-component connectivity (for full-stack projects)
+# Adapt the URL and port to match your project's API prefix and backend port
 echo "Verifying backend API..."
-API_RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8082/api/v1/ 2>/dev/null || echo "000")
+API_URL="http://localhost:8082"  # adjust to your backend URL and API prefix
+API_RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" "$API_URL/" 2>/dev/null || echo "000")
 if [ "$API_RESPONSE" = "404" ]; then
-  echo "⚠️  WARNING: Backend returns 404 for /api/v1/ — route prefix may be misconfigured"
+  echo "⚠️  WARNING: Backend returns 404 — route prefix may be misconfigured"
 fi
-CORS_HEADER=$(curl -s -I -X OPTIONS http://localhost:8082/api/v1/ -H 'Origin: http://localhost:3000' 2>/dev/null | grep -i 'access-control-allow-origin' || echo "")
+CORS_HEADER=$(curl -s -I -X OPTIONS "$API_URL/" -H 'Origin: http://localhost:3000' 2>/dev/null | grep -i 'access-control-allow-origin' || echo "")
 if [ -z "$CORS_HEADER" ]; then
   echo "⚠️  WARNING: No CORS headers detected — frontend requests will be blocked by browser"
 fi
@@ -262,21 +264,24 @@ curl -s http://localhost:8080/health || echo "Server not responding"
 ./bin/mytool --version 2>/dev/null || echo "CLI not built"
 ```
 
-## Web Project: CORS and Route Prefix Verification (IMPORTANT)
+## Cross-Component Connectivity Verification (IMPORTANT)
 
-For full-stack web projects where the frontend and backend run on different ports, **always verify CORS and route prefixes** after starting services. These are the #1 and #2 most common causes of "frontend can't load data" bugs.
+For projects where components run on different ports or domains (e.g., frontend + backend, microservices, API gateway + services), **always verify cross-component connectivity** after starting services. The most common failures are:
+- **Requests blocked by CORS** — browsers enforce cross-origin restrictions that tools like `curl` bypass
+- **Routes not matching between client and server** — route prefixes, path mismatches, or code generators omitting URL prefixes
+- **Auth tokens not forwarded** — credentials or headers dropped between components
+
+Verify connectivity by testing the actual paths your components use to communicate. For example, in a web project with a frontend on port 3000 and backend on port 8080:
 
 ```bash
-# 1. Verify backend API responds (not 404)
-# If using an API prefix like /api/v1, test the full path:
+# Example: Verify backend API responds at the path the frontend expects
 curl -s http://localhost:8080/api/v1/health || curl -s http://localhost:8080/api/v1/<any-list-endpoint> | head -3
-# If 404: the backend route registration doesn't include the prefix.
-# Common with code generators (ogen, openapi-generator) that register routes
-# without the OpenAPI servers.url prefix. Fix by mounting under the prefix.
+# If 404: the route prefix may be misconfigured between client and server.
 
-# 2. Verify CORS headers are set
+# Example: Verify CORS headers are present (web projects only)
 curl -s -I -X OPTIONS http://localhost:8080/api/v1/<any-endpoint> \
   -H 'Origin: http://localhost:3000' | grep -i 'access-control'
 # If no Access-Control-Allow-Origin header: add CORS middleware to the backend.
-# Without CORS headers, browsers block all requests from the frontend.
 ```
+
+Adapt these checks to your project's architecture — the principle is the same regardless of language or framework: verify that each component can reach the others at the expected paths with the expected headers.
diff --git a/references/verification/web-verification.md b/references/verification/web-verification.md
index 605047b..26b5686 100644
--- a/references/verification/web-verification.md
+++ b/references/verification/web-verification.md
@@ -7,11 +7,13 @@ Verify web features using Playwright E2E tests with screenshot capture and visua
 ## Overview
 
 Web projects are verified through:
-1. **E2E tests** — Playwright tests exercising user journeys
+1. **Interaction tests** — Playwright tests that **perform user actions** (click, fill, submit, navigate) and verify outcomes — this is the PRIMARY verification that features actually work
 2. **Screenshots** — Captured at key states for visual review
 3. **Visual review** — AI agent reviews every screenshot against quality criteria
 4. **UX standards compliance** — Loading/empty/error states, responsive, accessible
 
+**CRITICAL DISTINCTION:** Screenshots verify APPEARANCE. Interaction tests verify BEHAVIOR. Both are required, but interaction tests are MORE important — a feature that looks perfect but doesn't work is worse than a feature that looks rough but works correctly.
+
 ## Prerequisites
 
 ```bash
@@ -31,9 +33,61 @@ lsof -i :8082 | head -2  # Backend
 
 If not running, start them with `bash init.sh`.
 
-### Step 2: Write E2E Tests with Screenshots
+### Step 2: Write Tests That Prove the Outcome Works (PRIMARY VERIFICATION)
+
+Every feature MUST have Playwright tests that **perform the actions a user would perform** and **verify the results the user would expect**. This is the primary verification — it proves the feature actually works, not just that it renders.
+
+**The universal principle:** Ask "what can the user DO when this feature is done?" Then write a test that does exactly that and checks the result.
+
+Tests MUST:
+- **Perform real user actions** — click buttons, fill forms, navigate links, select options
+- **Verify observable outcomes** — text appears, page navigates, data changes, notifications show
+- **Cover the complete flow** — not just "page loads" but "user completes the task from start to finish"
+
+Tests MUST NOT:
+- Only navigate to a page and take a screenshot (proves rendering, not behavior)
+- Only check that a component exists without interacting with it
+- Skip verifying the result of an action (e.g., submit a form but don't check if data was saved)
+
+```typescript
+// CORRECT: Test proves the user can complete the task
+test('user can edit a product', async ({ page, request }) => {
+  // Setup: seed data via API
+  const res = await request.post('/api/v1/products', { data: { name: 'Original', sku: 'TEST-001', price: 10 } });
+  const product = (await res.json()).data;
+
+  // Act: perform the user journey
+  await page.goto('/products');
+  await page.getByRole('button', { name: `Actions for Original` }).click();
+  await page.getByRole('menuitem', { name: 'Edit' }).click();
+  await expect(page.getByLabel('Name')).toHaveValue('Original');
+  await page.getByLabel('Name').fill('Updated Name');
+  await page.getByRole('button', { name: 'Save' }).click();
+
+  // Assert: verify the outcome
+  await expect(page).toHaveURL('/products');
+  await expect(page.getByText('Updated Name')).toBeVisible();
+});
+
+// WRONG: Only proves the page renders, not that any feature works
+test('products page', async ({ page }) => {
+  await page.goto('/products');
+  await page.screenshot({ path: 'screenshot.png', fullPage: true });
+  // Edit could be broken, Delete could crash, Filters could be no-ops
+});
+```
+
+This principle applies beyond CRUD. For any feature:
+- **Search/filter:** Type a query → verify results change → clear → verify results reset
+- **Navigation:** Click a link → verify destination page loads with correct content
+- **Settings:** Change a setting → verify the change takes effect → reload → verify it persisted
+- **Workflow:** Start a process → advance through steps → verify completion state
+- **Upload:** Select file → upload → verify file appears in list
+- **Auth:** Login → verify access to protected page → logout → verify redirect to login
+
+### Step 3: Capture Screenshots at Key States
 
-Every test MUST capture screenshots at key user journey points.
+In addition to interaction tests, capture screenshots for visual review.
 
 **Screenshot directory:** Screenshots are stored in `e2e/screenshots/` relative to the directory containing `playwright.config.ts`. In a monorepo with `frontend/`, this is `frontend/e2e/screenshots/`. In a standalone frontend project, this is `e2e/screenshots/` at the project root. The parent agent resolves this to an absolute path and passes it as `{screenshots_dir}` in the subagent prompt.
 
@@ -221,8 +275,9 @@ The integration smoke test FAILS if any of these are true:
 ## Parent Agent Post-Verification
 
 After subagent completes, parent MUST:
-1. Confirm screenshots exist: `ls {screenshots_dir}/{scope}-feature-{id}-*.png 2>/dev/null | wc -l`
+1. **Confirm interaction tests exist and pass** — for CRUD features, check that the subagent wrote tests that exercise user flows (create, edit, delete), not just screenshot-only tests. If tests only take screenshots without clicking/submitting, the feature is NOT verified.
+2. Confirm screenshots exist: `ls {screenshots_dir}/{scope}-feature-{id}-*.png 2>/dev/null | wc -l`
    (`{screenshots_dir}` = absolute path to `e2e/screenshots/` relative to `playwright.config.ts`)
-2. Spot-check one screenshot with the Read tool
-3. If quality is poor, launch a polish subagent
-4. **For full-stack features**: verify screenshots show **real data**, not loading skeletons or empty states. If data is missing, run the integration smoke test above to diagnose.
+3. Spot-check one screenshot with the Read tool — verify it shows **real data and completed states** (e.g., edit form with pre-filled data, not just an empty form)
+4. If quality is poor, launch a polish subagent
+5. **For full-stack features**: verify screenshots show **real data**, not loading skeletons or empty states. If data is missing, run the integration smoke test above to diagnose.
diff --git a/references/web/e2e-verification.md b/references/web/e2e-verification.md
index b597f82..9a175e9 100644
--- a/references/web/e2e-verification.md
+++ b/references/web/e2e-verification.md
@@ -1,8 +1,10 @@
 # E2E Screenshot Verification — Full Details
 
+> **Subagent reference:** This file is inlined in subagent prompts for quick reference on screenshot mechanics. For the FULL verification process (including interaction tests), see `references/verification/web-verification.md`.
+
 Verify features work correctly using Playwright E2E tests with screenshot capture and visual review.
 
-**CRITICAL: Screenshots are MANDATORY for every feature. They are the primary evidence of correct implementation and UI quality. Skipping screenshots means the feature is NOT verified.**
+**Interaction tests that prove user outcomes are the PRIMARY verification.** Tests must perform real user actions (click, fill, submit, navigate) and verify observable results (data appears, page navigates, state changes). Screenshots are SECONDARY — they verify visual quality and catch layout/styling issues that interaction tests cannot detect. Both are required, but a feature that passes interaction tests with rough visuals is closer to done than one with perfect screenshots but broken behavior.
 
 ## Screenshot Directory
 
@@ -264,7 +266,7 @@ export default defineConfig({
 - AI agents need to review UI for UX issues, not just failures
 - Success screenshots enable visual regression detection
 - Human reviewers can audit AI's work quality
-- Screenshots are the primary evidence of correct implementation
+- Screenshots supplement interaction tests by catching visual regressions
 
 **Why short timeouts:**
 - Long waits waste tokens and time
diff --git a/references/web/ux-standards.md b/references/web/ux-standards.md
index 953ab33..52f1b8a 100644
--- a/references/web/ux-standards.md
+++ b/references/web/ux-standards.md
@@ -41,10 +41,10 @@ Every feature implemented by a subagent must meet these standards. A feature tha
 ## Visual Design Standards
 
 ### Typography Hierarchy
-- Page title: `text-2xl font-bold` or larger
-- Section title: `text-lg font-semibold`
-- Body text: `text-sm` or `text-base`
-- Caption/label: `text-xs text-muted-foreground`
+- Page title: large and bold (e.g., 24px+ bold)
+- Section title: medium and semi-bold (e.g., 18px semi-bold)
+- Body text: standard size (e.g., 14-16px)
+- Caption/label: small and muted (e.g., 12px, secondary color)
 - Choose distinctive fonts — avoid generic defaults like Inter, Arial, system-ui
 - Pair a display font with a complementary body font
 
@@ -57,22 +57,22 @@ Every feature implemented by a subagent must meet these standards. A feature tha
 
 ### Spacing Scale
 Use a consistent scale throughout the app:
-- `4px` (p-1) — tight inline spacing
-- `8px` (p-2) — compact elements
-- `12px` (p-3) — standard inline padding
-- `16px` (p-4) — standard section padding
-- `24px` (p-6) — generous section spacing
-- `32px` (p-8) — major section breaks
-- `48px` (p-12) — page-level spacing
+- `4px` — tight inline spacing
+- `8px` — compact elements
+- `12px` — standard inline padding
+- `16px` — standard section padding
+- `24px` — generous section spacing
+- `32px` — major section breaks
+- `48px` — page-level spacing
 
 ### Shadows & Depth
-- Cards: `shadow-sm` at rest, `shadow-md` on hover
-- Modals/dialogs: `shadow-lg`
-- Dropdowns: `shadow-md`
-- Always add transition: `transition-shadow duration-200`
+- Cards: subtle shadow at rest, slightly deeper on hover
+- Modals/dialogs: prominent shadow for depth
+- Dropdowns: medium shadow
+- Always add smooth transitions for shadow changes (~200ms)
 
 ### Transitions & Micro-interactions
-- Hover effects: `transition-colors duration-150` or `transition-all duration-200`
+- Hover effects: smooth color/background transitions (~150-200ms)
 - Button press feedback: slight scale or color change
 - Page elements: subtle fade-in on mount
 - Sidebar/menu open: slide transition with backdrop
@@ -92,15 +92,15 @@ Use a consistent scale throughout the app:
 
 ### Tables
 - Column headers: bold, uppercase or semi-bold, with sort indicators
-- Zebra striping: alternating row backgrounds (subtle, `even:bg-muted/50`)
-- Hover highlighting: `hover:bg-muted transition-colors`
+- Zebra striping: alternating row backgrounds (subtle, muted tone)
+- Hover highlighting: subtle background change on row hover with smooth transition
 - Text alignment: text left, numbers right, status centered
 - Actions column: icon buttons with tooltips
 - Pagination: show current page, total pages, and per-page count
 
 ### Cards / Grid Views
 - Consistent card sizing within a grid
-- Rounded corners (`rounded-lg`)
+- Rounded corners (medium to large radius)
 - Border or shadow for visual separation
 - Hover effect for clickable cards
 - Image aspect ratio maintained

From 9ce4fa45fc066bd1c698d75f5a58d0fa4f8bbf4a Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Thu, 19 Mar 2026 10:21:14 +0800
Subject: [PATCH 10/17] feat: add scope-local screenshots, refinements
 directory, and refinement phase

Move screenshots from global e2e/screenshots/ into per-scope directories
(specs/{scope}/screenshots/) so artifact provenance is clear. Add a
refinement phase after each feature that launches a dedicated subagent to
polish UX/visual design (divergent thinking, research) and code quality
(abstraction, testability, maintainability). Refinement analysis is
persisted to specs/{scope}/refinements/feature-{id}-refinement.md for
traceability across sessions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 SKILL.md                                    | 172 +++++++++++++++++---
 references/core/init-script-template.md     |  15 +-
 references/verification/web-verification.md |  16 +-
 references/web/e2e-verification.md          |  41 ++---
 4 files changed, 181 insertions(+), 63 deletions(-)

diff --git a/SKILL.md b/SKILL.md
index dd8c91f..af0c3b6 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -92,11 +92,15 @@ project-root/
 │   ├── auth/
 │   │   ├── spec.md
 │   │   ├── feature_list.json
-│   │   └── progress.txt
+│   │   ├── progress.txt
+│   │   ├── screenshots/
+│   │   └── refinements/
 │   └── video-editor/
 │       ├── spec.md
 │       ├── feature_list.json
-│       └── progress.txt
+│       ├── progress.txt
+│       ├── screenshots/
+│       └── refinements/
 ├── .active-scope
 ├── spec.md              # Symlink to active scope
 ├── feature_list.json    # Symlink to active scope
@@ -114,7 +118,7 @@ project-root/
 
 2. **Create new scope** (if needed)
    ```bash
-   mkdir -p specs/auth
+   mkdir -p specs/auth/{screenshots,refinements}
    # Create specs/auth/spec.md with specification
    ```
 
@@ -158,7 +162,7 @@ project-root/
    ```
    "Write interaction tests: Playwright tests that perform user actions (click, fill, submit, navigate) and verify outcomes (data appears, page navigates, state changes)",
    "Capture screenshots: add fullPage screenshots at key states (list view, empty state, form, after action)",
-   "Run Playwright tests and verify all pass and screenshots are generated in e2e/screenshots/",
+   "Run Playwright tests and verify all pass and screenshots are generated in specs/{scope}/screenshots/",
    "Visually review each screenshot: verify layout, spacing, hierarchy, loading/empty/error states, data display, and overall polish",
    "Fix any issues found (broken behavior or visual problems) and re-run until quality is acceptable"
    ```
@@ -215,9 +219,10 @@ WHILE there are features with "passes": false in feature_list.json:
     1. Read feature_list.json to find next incomplete feature (highest priority first)
     2. Launch a SUBAGENT to implement, test, verify, and commit
     3. After subagent completes, VERIFY output quality (see below)
-    4. features_completed_this_session++
-    5. If features_completed_this_session % 5 == 0: run STANDARDS AUDIT (see below)
-    6. CONTINUE to next feature — do NOT stop
+    4. Launch a REFINEMENT SUBAGENT to polish the feature (see Refinement Phase below)
+    5. features_completed_this_session++
+    6. If features_completed_this_session % 5 == 0: run STANDARDS AUDIT (see below)
+    7. CONTINUE to next feature — do NOT stop
 END WHILE
 
 Run FINAL STANDARDS AUDIT before ending session
@@ -285,10 +290,10 @@ Follow {skill_base_dir}/references/verification/{type}-verification.md:
 
 Interaction tests (Phase 3) are the PRIMARY verification that features work. Screenshots are SECONDARY but MANDATORY — they verify visual quality and catch layout/styling issues that interaction tests cannot detect. A feature without both interaction tests and screenshots is NOT fully verified.
 
-**Screenshot directory:** `{screenshots_dir}` (provided by parent agent — this is the absolute path to where screenshots are stored, e.g., `/path/to/project/frontend/e2e/screenshots` for a monorepo or `/path/to/project/e2e/screenshots` for a standalone frontend).
+**Screenshot directory:** `{screenshots_dir}` (provided by parent agent — this is `{pwd}/specs/{scope}/screenshots/`, the scope-specific directory for all visual artifacts).
 
 13. Write or update a Playwright test file that captures screenshots at key states:
-    - Use `page.screenshot({ path: '{screenshots_dir}/{scope}-feature-{id}-step{N}-{description}.png', fullPage: true })`
+    - Use `page.screenshot({ path: '{screenshots_dir}/feature-{id}-step{N}-{description}.png', fullPage: true })`
     - Capture BEFORE action, AFTER action, error states, and empty states
     - Every test MUST have at least one `page.screenshot()` call
 
@@ -299,7 +304,7 @@ Interaction tests (Phase 3) are the PRIMARY verification that features work. Scr
 
 15. Verify screenshots were generated:
     ```bash
-    ls {screenshots_dir}/{scope}-feature-{id}-*.png
+    ls {screenshots_dir}/feature-{id}-*.png
     ```
     If no screenshots exist, the verification has FAILED. Fix and re-run.
 
@@ -313,8 +318,8 @@ Interaction tests (Phase 3) are the PRIMARY verification that features work. Scr
 
 17. If screenshots reveal problems, fix the UI and re-capture until quality is acceptable.
 
-**Screenshot naming convention:** `{scope}-feature-{id}-step{N}-{description}.png`
-Examples: `pim-feature-9-step1-product-list.png`, `pim-feature-9-step2-empty-state.png`
+**Screenshot naming convention:** `feature-{id}-step{N}-{description}.png` (scope is encoded in the directory path `specs/{scope}/screenshots/`)
+Examples: `feature-9-step1-product-list.png`, `feature-9-step2-empty-state.png`
 {END IF}
 
 ### Phase 4: Gitignore Review
@@ -373,15 +378,14 @@ The subagent handles implementation, testing, verification, and committing. The
 
    This gate MUST be executed for EVERY UI feature. It is the primary quality control for visual output. Skipping this gate means the feature is NOT verified.
 
-   **Determine `{screenshots_dir}`:** The screenshot directory depends on project structure:
-   - **Monorepo** (frontend in a subdirectory like `frontend/`): `{pwd}/frontend/e2e/screenshots`
-   - **Standalone frontend** (frontend at project root): `{pwd}/e2e/screenshots`
-   - Auto-detect: look for `playwright.config.ts` — screenshots live in `e2e/screenshots/` relative to that config file's directory.
+   **Determine `{screenshots_dir}`:** Screenshots are stored per-scope: `{pwd}/specs/{scope}/screenshots/`
+   - Read the active scope from `.active-scope`
+   - Resolve to absolute path: `{pwd}/specs/{scope}/screenshots/`
    - You MUST pass this resolved absolute path as `{screenshots_dir}` when building subagent prompts.
 
    a. **CHECK screenshots exist:**
       ```bash
-      ls {screenshots_dir}/{scope}-feature-{id}-*.png 2>/dev/null | wc -l
+      ls {screenshots_dir}/feature-{id}-*.png 2>/dev/null | wc -l
       ```
    b. **If count is 0: BLOCK.** The feature is NOT complete. Launch a follow-up subagent specifically to capture screenshots:
       ```
@@ -389,9 +393,9 @@ The subagent handles implementation, testing, verification, and committing. The
       The feature is already implemented and committed. Your ONLY job is:
       1. Start the dev server if not running (check with lsof, start with init.sh if needed)
       2. Write/update a Playwright test that navigates to the feature and captures screenshots
-      3. Screenshots MUST be saved to: {screenshots_dir}/{scope}-feature-{id}-step{N}-{description}.png
+      3. Screenshots MUST be saved to: {screenshots_dir}/feature-{id}-step{N}-{description}.png
       4. Run the test: npx playwright test
-      5. Verify screenshots exist: ls {screenshots_dir}/{scope}-feature-{id}-*.png
+      5. Verify screenshots exist: ls {screenshots_dir}/feature-{id}-*.png
       6. Use the Read tool to visually review each screenshot
       7. Commit the screenshots and test file"
       ```
@@ -443,7 +447,120 @@ The subagent handles implementation, testing, verification, and committing. The
    - Check edge cases (empty, null, duplicate) are tested
 
 4. If the subagent failed to complete, launch another subagent to fix and finish.
-5. **Loop back IMMEDIATELY** — pick the next incomplete feature and launch a new subagent RIGHT NOW. Do NOT stop, do NOT report to the user, do NOT wait for instructions. KEEP GOING until ALL features pass.
+5. **Launch REFINEMENT SUBAGENT** — See "Refinement Phase" below. This polishes the feature's UX and code quality before moving on.
+6. **Loop back IMMEDIATELY** — pick the next incomplete feature and launch a new subagent RIGHT NOW. Do NOT stop, do NOT report to the user, do NOT wait for instructions. KEEP GOING until ALL features pass.
+
+### Refinement Phase (After Each Feature)
+
+After a feature passes verification and is committed, launch a **refinement subagent** to polish it. This is a separate subagent so it evaluates the feature with fresh context — a "second pair of eyes" pass.
+
+The refinement subagent writes its analysis to `specs/{scope}/refinements/feature-{id}-refinement.md` so the thinking process is traceable across sessions.
+
+**Refinement subagent prompt template:**
+
+```
+You are refining a recently completed feature. The feature is already implemented, tested, verified, and committed. Your job is to polish and improve it — both the user experience and the code quality.
+
+## Project Context
+- Working directory: {pwd}
+- Active scope: {scope}
+- Project type: {type}
+- Feature just completed: #{id} — {description}
+- Screenshots directory: {screenshots_dir}
+- Refinement output: {pwd}/specs/{scope}/refinements/feature-{id}-refinement.md
+
+## Standards Documents
+Read these before starting:
+- {skill_base_dir}/references/core/code-quality.md
+{IF type == "web" or type == "mobile":}
+- {skill_base_dir}/references/web/ux-standards.md
+- {skill_base_dir}/references/web/frontend-design.md
+{END IF}
+
+## What Was Done
+Review the most recent commit to understand what was implemented:
+git log --oneline -1
+git diff HEAD~1 --name-only
+
+{IF type == "web" or type == "mobile":}
+## Part 1: UX/Visual Refinement
+
+Think divergently about how to make users LOVE this interface. Don't just check for bugs — imagine better ways to present the information and interactions.
+
+1. Use the Read tool to review ALL screenshots in {screenshots_dir}/ for this feature
+2. For each screen, evaluate from a first-time user's perspective:
+   - Is the purpose of this screen immediately obvious?
+   - Can the user figure out what to do without instructions?
+   - Does the visual hierarchy guide the eye to the most important action?
+   - Are transitions and state changes smooth and predictable?
+3. Think divergently about improvements — consider alternatives you haven't tried:
+   - Could the layout be reorganized for better flow or scannability?
+   - Would micro-interactions (hover effects, transitions, focus states) make it feel more responsive and alive?
+   - Is whitespace being used effectively to create breathing room and group related elements?
+   - Could typography be more expressive — size contrasts, weight variations, line heights?
+   - Are colors creating the right emotional tone? Could accent colors highlight key actions better?
+   - Are empty states, loading states, and error states not just functional but helpful and encouraging?
+   - Could icons, illustrations, or subtle visual cues improve comprehension?
+4. Research: look at how the standards documents suggest handling similar UI patterns. Are there recommendations you missed?
+5. Implement the most impactful improvements — prioritize changes that make the biggest difference to user understanding and delight
+6. Re-run Playwright tests and re-capture screenshots
+7. Visually verify the improvements look better than before
+{END IF}
+
+## Part 2: Code Quality Refinement
+
+Re-read all generated code with fresh eyes, looking for opportunities to make it more maintainable and testable.
+
+1. Read ALL files changed in the most recent commit: `git diff HEAD~1 --name-only`
+2. For each file, evaluate:
+   - **Abstraction**: Are there functions doing too many things? Should logic be extracted?
+   - **Testability**: Is business logic separated from framework/UI code? Could someone write a unit test for the core logic without setting up the whole framework?
+   - **Readability**: Would a new developer understand this code without extensive context? Are names clear and descriptive?
+   - **Duplication**: Is there repeated logic that should be a shared utility?
+   - **Simplicity**: Are there overly complex control flows that could be simplified? Deep nesting that could be flattened?
+3. Make concrete improvements — refactor, rename, extract, simplify
+4. Run all unit tests — ensure they still pass
+5. If you extracted new logic, write unit tests for it
+
+## Part 3: Write Refinement Report
+
+Write your analysis to `{pwd}/specs/{scope}/refinements/feature-{id}-refinement.md` with this structure:
+
+```markdown
+# Feature #{id} Refinement: {description}
+
+## UX Analysis (web/mobile only)
+- **Screenshots reviewed**: [list of screenshots]
+- **Issues found**: [what problems or opportunities were identified]
+- **Alternatives considered**: [what other approaches were thought about]
+- **Changes made**: [what was actually improved and why]
+- **Changes deferred**: [ideas noted for future consideration, if any]
+
+## Code Quality Analysis
+- **Files reviewed**: [list of files]
+- **Issues found**: [code smells, abstraction opportunities, naming issues]
+- **Refactoring done**: [what was changed and why]
+- **Test coverage**: [new tests added, if any]
+
+## Summary
+[1-2 sentence summary of the refinement pass]
+```
+
+## Commit
+If you made code or UI changes:
+git add -A && git commit -m "refine: polish feature #{id} — [summary of improvements]"
+
+If no code changes were warranted, still commit the refinement report:
+git add specs/{scope}/refinements/ && git commit -m "refine: review feature #{id} — no changes needed"
+
+## Rules
+- This is a POLISH pass — do NOT add new functionality
+- Do NOT break existing tests
+- Keep changes focused on improving what exists
+- Think creatively about UX — the goal is to make users enjoy and understand the interface
+- Think critically about code — the goal is to make the codebase a pleasure to maintain
+- ALWAYS write the refinement report, even if no changes are made
+```
 
 ### Periodic Standards Audit
 
@@ -545,8 +662,8 @@ Before ending:
 
 ### Verification Enforcement (web/mobile projects — NON-NEGOTIABLE)
 - **Interaction tests proving user outcomes are PRIMARY verification** — every UI feature MUST have tests that perform user actions and verify results
-- Every UI feature MUST also have screenshots in `{screenshots_dir}/{scope}-feature-{id}-*.png` — screenshots are SECONDARY for visual quality
-- `{screenshots_dir}` is determined by project structure: `{pwd}/frontend/e2e/screenshots` for monorepos, `{pwd}/e2e/screenshots` for standalone frontends. Auto-detect by finding `playwright.config.ts`.
+- Every UI feature MUST also have screenshots in `{screenshots_dir}/feature-{id}-*.png` — screenshots are SECONDARY for visual quality
+- `{screenshots_dir}` is `{pwd}/specs/{scope}/screenshots/` — screenshots are stored per-scope alongside other scope artifacts.
 - The parent agent MUST check for both interaction tests AND screenshots after EVERY subagent that implements a UI feature
 - If either is missing, the parent MUST launch a follow-up subagent — the feature is NOT done
 - The subagent prompt template includes inlined verification instructions so subagents know what to do without needing to find external docs
@@ -556,12 +673,21 @@ Before ending:
 - NEVER wait for human approval
 - NEVER stop to "report progress" or "check in" — the user can see commits in git log
 - NEVER output a summary and wait — immediately launch the next subagent
-- After each subagent completes: verify → launch next subagent. That's it. No pausing.
+- After each subagent completes: verify → refine → launch next subagent. That's it. No pausing.
 - Make reasonable decisions based on existing patterns
 - If blocked, try alternative approaches before giving up
 - Keep working until ALL features are complete
 - The continue workflow is a LOOP, not a single step. You are the loop controller.
 
+### Refinement Enforcement
+- Every completed feature MUST go through the refinement phase before moving to the next feature
+- The refinement subagent is separate from the implementation subagent — fresh context enables better evaluation
+- Refinement MUST NOT add new functionality — it only improves what exists
+- The refinement report (`specs/{scope}/refinements/feature-{id}-refinement.md`) MUST always be written, even if no code changes are made
+- Refinement commits use the prefix `refine:` not `feat:`
+- For web/mobile: UX refinement should think divergently — not just check for bugs, but imagine better ways to present information
+- For all types: code refinement should focus on abstraction, testability, and maintainability
+
 ---
 
 ## Reference Files
diff --git a/references/core/init-script-template.md b/references/core/init-script-template.md
index 94704fa..1e2cf83 100644
--- a/references/core/init-script-template.md
+++ b/references/core/init-script-template.md
@@ -28,14 +28,14 @@ pkill -f 'node.*dev' 2>/dev/null || true
 sleep 1
 
 # 2. Delete old screenshots for fresh test results
-# Note: screenshot dir is e2e/screenshots/ relative to playwright.config.ts
-# For monorepo (frontend/ subdir): clean frontend/e2e/screenshots/
-# For standalone frontend (root): clean e2e/screenshots/
+# Screenshots are stored per-scope in specs/{scope}/screenshots/
 echo "Cleaning old test artifacts..."
-SCREENSHOT_DIR="e2e/screenshots"  # adjust to "frontend/e2e/screenshots" for monorepos
+SCOPE=$(cat .active-scope 2>/dev/null || echo "default")
+SCREENSHOT_DIR="specs/$SCOPE/screenshots"
 rm -rf "$SCREENSHOT_DIR"/*.png 2>/dev/null || true
 rm -rf test-results 2>/dev/null || true
 mkdir -p "$SCREENSHOT_DIR"
+mkdir -p "specs/$SCOPE/refinements"
 
 # 3. Install/update dependencies
 echo "Installing dependencies..."
@@ -219,8 +219,11 @@ echo "=== Mobile Development Environment ==="
 pkill -f 'metro\|react-native' 2>/dev/null || true
 
 # 2. Clean old artifacts
-rm -rf test-results/ screenshots/ 2>/dev/null || true
-mkdir -p screenshots
+# Screenshots are stored per-scope in specs/{scope}/screenshots/
+SCOPE=$(cat .active-scope 2>/dev/null || echo "default")
+rm -rf "specs/$SCOPE/screenshots"/*.png 2>/dev/null || true
+rm -rf test-results/ 2>/dev/null || true
+mkdir -p "specs/$SCOPE/screenshots" "specs/$SCOPE/refinements"
 
 # 3. Install dependencies
 npm install
diff --git a/references/verification/web-verification.md b/references/verification/web-verification.md
index 26b5686..115928e 100644
--- a/references/verification/web-verification.md
+++ b/references/verification/web-verification.md
@@ -89,7 +89,7 @@ This principle applies beyond CRUD. For any feature:
 
 In addition to interaction tests, capture screenshots for visual review.
 
-**Screenshot directory:** Screenshots are stored in `e2e/screenshots/` relative to the directory containing `playwright.config.ts`. In a monorepo with `frontend/`, this is `frontend/e2e/screenshots/`. In a standalone frontend project, this is `e2e/screenshots/` at the project root. The parent agent resolves this to an absolute path and passes it as `{screenshots_dir}` in the subagent prompt.
+**Screenshot directory:** Screenshots are stored per-scope at `specs/{scope}/screenshots/` relative to the project root. The parent agent resolves this to an absolute path (`{pwd}/specs/{scope}/screenshots/`) and passes it as `{screenshots_dir}` in the subagent prompt.
 
 ```typescript
 import { test, expect } from '@playwright/test';
@@ -100,7 +100,7 @@ test('user can login', async ({ page }) => {
   // Screenshot: Initial state
   // Path is relative to the Playwright project root (where playwright.config.ts lives)
   await page.screenshot({
-    path: `e2e/screenshots/${scope}-feature-${id}-step1-login-initial.png`,
+    path: `${screenshots_dir}/feature-${id}-step1-login-initial.png`,
     fullPage: true
   });
 
@@ -112,7 +112,7 @@ test('user can login', async ({ page }) => {
 
   // Screenshot: After action
   await page.screenshot({
-    path: `e2e/screenshots/${scope}-feature-${id}-step2-dashboard-after-login.png`,
+    path: `${screenshots_dir}/feature-${id}-step2-dashboard-after-login.png`,
     fullPage: true
   });
 });
@@ -172,11 +172,11 @@ If screenshots reveal problems:
 
 ## Screenshot Naming Convention
 
-Format: `{scope}-feature-{id}-step{N}-{description}.png`
+Format: `feature-{id}-step{N}-{description}.png` (scope is encoded in the directory path `specs/{scope}/screenshots/`)
 
 Examples:
-- `auth-feature-17-step3-modal-open.png`
-- `core-feature-7-step6-project-in-list.png`
+- `feature-17-step3-modal-open.png`
+- `feature-7-step6-project-in-list.png`
 
 ## Playwright Configuration
 
@@ -276,8 +276,8 @@ The integration smoke test FAILS if any of these are true:
 
 After subagent completes, parent MUST:
 1. **Confirm interaction tests exist and pass** — for CRUD features, check that the subagent wrote tests that exercise user flows (create, edit, delete), not just screenshot-only tests. If tests only take screenshots without clicking/submitting, the feature is NOT verified.
-2. Confirm screenshots exist: `ls {screenshots_dir}/{scope}-feature-{id}-*.png 2>/dev/null | wc -l`
-   (`{screenshots_dir}` = absolute path to `e2e/screenshots/` relative to `playwright.config.ts`)
+2. Confirm screenshots exist: `ls {screenshots_dir}/feature-{id}-*.png 2>/dev/null | wc -l`
+   (`{screenshots_dir}` = `{pwd}/specs/{scope}/screenshots/`)
 3. Spot-check one screenshot with the Read tool — verify it shows **real data and completed states** (e.g., edit form with pre-filled data, not just an empty form)
 4. If quality is poor, launch a polish subagent
 5. **For full-stack features**: verify screenshots show **real data**, not loading skeletons or empty states. If data is missing, run the integration smoke test above to diagnose.
diff --git a/references/web/e2e-verification.md b/references/web/e2e-verification.md
index 9a175e9..cb9059b 100644
--- a/references/web/e2e-verification.md
+++ b/references/web/e2e-verification.md
@@ -8,16 +8,9 @@ Verify features work correctly using Playwright E2E tests with screenshot captur
 
 ## Screenshot Directory
 
-Screenshots are stored in `e2e/screenshots/` **relative to the directory containing `playwright.config.ts`**. This varies by project structure:
+Screenshots are stored per-scope at `specs/{scope}/screenshots/` relative to the project root. The parent agent resolves this to an absolute path (`{pwd}/specs/{scope}/screenshots/`) and passes it as `{screenshots_dir}` in the subagent prompt.
 
-| Project Structure | `playwright.config.ts` location | Screenshot directory |
-|---|---|---|
-| Monorepo (`frontend/` subdir) | `frontend/playwright.config.ts` | `frontend/e2e/screenshots/` |
-| Standalone frontend (root) | `playwright.config.ts` | `e2e/screenshots/` |
-
-In Playwright test code, always use the **relative** path `e2e/screenshots/...` — Playwright resolves it from its config directory.
-
-For parent agent verification, resolve to an **absolute** path: find `playwright.config.ts`, then append `e2e/screenshots/`.
+In Playwright test code, always use the **absolute** `{screenshots_dir}` path provided by the parent agent in `page.screenshot()` calls.
 
 ## Prerequisites
 
@@ -41,11 +34,9 @@ If not running, start them with `bash init.sh`.
 
 ### Step 2: Clear Old Screenshots
 
-The screenshot directory is `e2e/screenshots/` relative to the directory containing `playwright.config.ts`. In a monorepo (e.g., `frontend/`), run commands from that subdirectory. In a standalone project, run from the project root.
-
 ```bash
-# Run from the directory containing playwright.config.ts
-rm -rf e2e/screenshots/*.png e2e/screenshots/**/*.png 2>/dev/null || true
+# Clear screenshots from the scope's screenshot directory
+rm -rf specs/{scope}/screenshots/*.png 2>/dev/null || true
 rm -rf test-results/**/*.png 2>/dev/null || true
 ```
 
@@ -83,7 +74,7 @@ Common failure causes:
 ### Step 5: List All Screenshots
 
 ```bash
-find e2e/screenshots -name "*.png" -type f 2>/dev/null | sort
+find specs/{scope}/screenshots -name "*.png" -type f 2>/dev/null | sort
 find test-results -name "*.png" -type f 2>/dev/null | sort
 ```
 
@@ -181,7 +172,7 @@ test('user can login', async ({ page }) => {
 
   // Screenshot: Login page initial state
   await page.screenshot({
-    path: `e2e/screenshots/auth-feature-1-step1-login-initial.png`,
+    path: `${screenshots_dir}/feature-1-step1-login-initial.png`,
     fullPage: true
   });
 
@@ -193,7 +184,7 @@ test('user can login', async ({ page }) => {
 
   // Screenshot: Dashboard after login
   await page.screenshot({
-    path: `e2e/screenshots/auth-feature-1-step2-dashboard-after-login.png`,
+    path: `${screenshots_dir}/feature-1-step2-dashboard-after-login.png`,
     fullPage: true
   });
 });
@@ -202,7 +193,7 @@ test('user can login', async ({ page }) => {
 ### Screenshot Rules (MANDATORY)
 
 - **Every test MUST have at least one `page.screenshot()` call**
-- Name screenshots descriptively with scope prefix
+- Name screenshots descriptively (scope is encoded in the directory path)
 - Use `fullPage: true` to capture complete page state
 - Capture at key user journey points (before action, after action, error state)
 - Include error states and empty states in screenshots
@@ -211,29 +202,27 @@ test('user can login', async ({ page }) => {
   // Desktop screenshot
   await page.setViewportSize({ width: 1280, height: 720 });
   await page.screenshot({
-    path: 'e2e/screenshots/scope-feature-1-step1-desktop.png',
+    path: `${screenshots_dir}/feature-1-step1-desktop.png`,
     fullPage: true
   });
 
   // Mobile screenshot
   await page.setViewportSize({ width: 375, height: 812 });
   await page.screenshot({
-    path: 'e2e/screenshots/scope-feature-1-step1-mobile.png',
+    path: `${screenshots_dir}/feature-1-step1-mobile.png`,
     fullPage: true
   });
   ```
 
 ### Screenshot Naming Convention
 
-Format: `{scope}-feature-{id}-step{N}-{description}.png`
-
-The scope name comes from `.active-scope` file (e.g., "auth", "core", "video-editor").
+Format: `feature-{id}-step{N}-{description}.png` (scope is encoded in the directory path `specs/{scope}/screenshots/`)
 
 Examples:
-- `auth-feature-17-step3-modal-open.png`
-- `core-feature-7-step6-project-in-list.png`
-- `video-editor-feature-15-complete-flow.png`
-- `pim-feature-4-step2-validation-errors.png`
+- `feature-17-step3-modal-open.png`
+- `feature-7-step6-project-in-list.png`
+- `feature-15-complete-flow.png`
+- `feature-4-step2-validation-errors.png`
 
 ## Playwright Configuration
 

From a828fa965d9603d7f8e83f73a433b48e9d91ef35 Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Thu, 19 Mar 2026 10:51:25 +0800
Subject: [PATCH 11/17] feat: add Vertical Slices rule for full-stack feature
 organization

Full-stack domain features must implement backend and frontend together
in one feature, not split by technology layer. This catches CORS, route
prefix, and response format issues during development instead of at
integration time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 SKILL.md                               |  4 ++
 references/core/feature-list-format.md | 98 +++++++++++++++++++++-----
 2 files changed, 86 insertions(+), 16 deletions(-)

diff --git a/SKILL.md b/SKILL.md
index af0c3b6..a994587 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -134,6 +134,7 @@ project-root/
    - What does the user interact with? **Browser** → `web`. **Terminal** → `cli`. **Import/call** → `library`. **HTTP requests** → `api`. **Phone/tablet** → `mobile`. **Data outputs** → `data`.
    - Confirm by examining the codebase structure (e.g., frontend frameworks suggest `web`, CLI entry points suggest `cli`, no main entry point suggests `library`)
    - If unclear, default to the most fitting type based on spec.md
+   - **Full-stack detection:** If the project has both a backend (Go, Node, Python, etc.) and a frontend (React, Vue, etc.), features MUST be structured as **vertical slices** — see `references/core/feature-list-format.md` for the "Vertical Slices" rule
 
 5. **Create feature list** — choose the right method:
 
@@ -150,6 +151,9 @@ project-root/
 
    **Important:** Include the `"type"` field in feature_list.json (see feature-list-format.md).
 
+   **CRITICAL — Vertical Slices for Full-Stack Projects (NON-NEGOTIABLE):**
+   For full-stack projects (`web` type with both backend and frontend), each domain feature MUST be a **vertical slice** that implements backend AND frontend together in one feature. Do NOT split features by technology layer (e.g., "Backend: category CRUD" then "Frontend: category pages"). Instead, each feature delivers a complete user journey through the entire stack: model → service → API → UI → E2E test. See `references/core/feature-list-format.md` for the "Vertical Slices" rule, examples, and anti-patterns.
+
    **CRITICAL — Self-Contained Features (NON-NEGOTIABLE):**
    Every feature MUST include its own test and verification steps. NEVER create separate "testing" or "verification" features (e.g., "Write integration tests", "Add E2E tests for all pages"). Each feature's `steps` array must contain both implementation AND verification steps so the feature can be independently verified when completed. See `references/core/feature-list-format.md` for the "Self-Contained Features" rule and examples.
 
diff --git a/references/core/feature-list-format.md b/references/core/feature-list-format.md
index ae62c32..0f1fe65 100644
--- a/references/core/feature-list-format.md
+++ b/references/core/feature-list-format.md
@@ -110,6 +110,68 @@ When a feature is a component ("category list page"), the verification only cove
 
 Some features are genuinely infrastructure with no user-facing outcome: project setup, database migration, code generation, CI/CD configuration. These are fine as component-level features. The rule applies to features that deliver **user-facing or caller-facing functionality**.
 
+## Vertical Slices for Full-Stack Projects (NON-NEGOTIABLE)
+
+### The Problem This Solves
+
+When full-stack features are split by layer — "Backend: category CRUD" then "Frontend: category pages" — the backend gets built in isolation without knowing if the frontend can actually consume it. CORS issues, response envelope mismatches, route prefix problems, and pagination format disagreements all hide until the frontend feature starts. By then, the backend is "done" and marked passing, but must be reworked. Worse, the developer loses the backend implementation context by the time frontend work begins.
+
+### The Rule
+
+**For full-stack projects (`web` type with both backend and frontend), each domain feature MUST be a vertical slice that implements backend AND frontend together in one feature.**
+
+A vertical slice delivers a complete, working user journey through the entire stack: database model → service → API endpoint → generated types → UI component → E2E test. When the feature is done, the user can actually use it end-to-end.
+
+### How to Structure Vertical Slice Steps
+
+Each full-stack feature's `steps` array should flow through the stack in order:
+
+1. **Backend model & service** — GORM model, service interface & implementation, business logic
+2. **Backend wiring** — Handler implementation, error mapping, route registration
+3. **Backend integration tests** — Table-driven tests through root mux ServeHTTP
+4. **Frontend UI** — Pages, forms, tables, using generated hooks from the shared OpenAPI spec
+5. **Frontend E2E tests** — Playwright tests against the real running backend
+6. **Screenshots & visual review** — Capture and review key states
+
+### Examples
+
+**CORRECT — Vertical slice (backend + frontend together):**
+```json
+{
+  "id": 2,
+  "description": "User can manage categories (create, view list, edit, delete)",
+  "category": "full-stack",
+  "steps": [
+    "Create backend model, service, and handler for category CRUD",
+    "Write backend integration tests for all category operations",
+    "Run backend tests and verify all pass",
+    "Build frontend category list, create form, edit form, delete dialog using generated hooks",
+    "Write E2E tests: seed via API, test full CRUD journey through the UI",
+    "Capture screenshots and visually review",
+    "Fix any issues and re-run until all pass"
+  ]
+}
+```
+
+**WRONG — Split by layer (backend separate from frontend):**
+```json
+// DON'T DO THIS — features split by technology layer
+{ "id": 2, "description": "Backend: category CRUD API endpoints", "category": "backend", ... },
+{ "id": 3, "description": "Backend: product CRUD API endpoints", "category": "backend", ... },
+{ "id": 4, "description": "Frontend: category management pages", "category": "frontend", ... },
+{ "id": 5, "description": "Frontend: product management pages", "category": "frontend", ... }
+```
+
+### Infrastructure / Scaffolding Exception
+
+The first feature (project scaffolding) naturally spans both stacks and is fine as infrastructure. The rule applies to **domain features** that deliver user-facing functionality — these must be vertical slices.
+
+### When This Rule Applies
+
+- **Full-stack web projects** (Go + React, Node + React, etc.) — ALWAYS use vertical slices for domain features
+- **API-only or frontend-only projects** — Rule does not apply (there's only one layer)
+- **Projects with independent backend/frontend repos** — Use vertical slices if both are in the same repo/scope
+
 ## Self-Contained Features (NON-NEGOTIABLE)
 
 Every feature MUST be independently verifiable. This means:
@@ -222,40 +284,44 @@ Every feature's test steps should be concrete and verifiable — they should des
 
 Note: Every example below defines features as **user outcomes** with verification steps that **prove the outcome works**. Features are NOT split into component-level pieces.
 
-### Web Project (Full-Stack)
+### Web Project (Full-Stack — Vertical Slices)
+
+Each domain feature is a **vertical slice** — backend and frontend developed together. See "Vertical Slices for Full-Stack Projects" rule above.
+
 ```json
 {
   "type": "web",
   "features": [
     {
       "id": 1,
-      "category": "functional",
+      "category": "infrastructure",
       "priority": "high",
       "description": "Project scaffolding and shared infrastructure",
       "steps": [
-        "Initialize frontend (React, Vite, Router, UI library) and backend (Go, framework) projects",
-        "Create shared OpenAPI spec, generate types for both sides",
-        "Create root layout with navigation",
-        "Verify frontend compiles and dev server starts",
-        "Verify backend compiles"
+        "Create shared OpenAPI spec with all schemas and endpoints",
+        "Initialize backend (Go, framework, ogen codegen) and frontend (React, Vite, Router, UI library, orval codegen)",
+        "Generate types for both sides from the shared OpenAPI spec",
+        "Create root layout with navigation, route placeholders, custom fetch config",
+        "Set up Playwright config and test helpers",
+        "Verify backend compiles and frontend dev server starts"
       ],
       "passes": false
     },
     {
       "id": 2,
-      "category": "functional",
+      "category": "full-stack",
       "priority": "high",
       "description": "User can manage categories (create, view list, edit, delete)",
       "steps": [
-        "Implement backend: category CRUD endpoints with validation and error handling",
-        "Write backend integration tests: create, list, get, update, delete, duplicate slug rejection",
+        "Create backend model, service with business logic, and wire into ogen handler",
+        "Write backend integration tests: create, list, get, update, delete, edge cases",
         "Run backend tests and verify all pass",
-        "Implement frontend: category list page, create form, edit form, delete confirmation",
+        "Build frontend category list page, create form, edit form, delete confirmation using generated hooks",
         "Write E2E test: seed category via API, navigate to list, verify it's visible",
         "Write E2E test: click New, fill form, submit, verify new category in list",
         "Write E2E test: click Edit on a category, verify form has existing data, change name, submit, verify updated name in list",
         "Write E2E test: click Delete, confirm, verify category removed from list",
-        "Run all tests, verify all pass",
+        "Run all tests (backend + E2E), verify all pass",
         "Capture screenshots of list, create form, edit form, delete dialog, empty state",
         "Visually review screenshots for layout and polish",
         "Fix any issues and re-run until all tests pass and screenshots look good"
@@ -264,21 +330,21 @@ Note: Every example below defines features as **user outcomes** with verificatio
     },
     {
       "id": 3,
-      "category": "functional",
+      "category": "full-stack",
       "priority": "high",
       "description": "User can manage products (create, view list with filters, edit, delete, bulk status change)",
       "steps": [
-        "Implement backend: product CRUD + bulk status + filtering endpoints",
+        "Create backend model, service with filtering/pagination/bulk-update logic, and wire into ogen handler",
         "Write backend integration tests: all CRUD ops, filters, bulk update, edge cases",
         "Run backend tests and verify all pass",
-        "Implement frontend: product list with filters/search/pagination, create form, edit form, delete dialog, bulk actions",
+        "Build frontend product list with filters/search/pagination, create form, edit form, delete dialog, bulk actions using generated hooks",
         "Write E2E test: seed products, navigate to list, verify data visible with correct prices and statuses",
         "Write E2E test: create product with category selection, verify in list",
         "Write E2E test: edit a product, verify changes persist",
         "Write E2E test: delete a product, verify removed",
         "Write E2E test: select multiple products, bulk change status, verify statuses updated",
         "Write E2E test: filter by category, verify only matching products shown",
-        "Run all tests, verify all pass",
+        "Run all tests (backend + E2E), verify all pass",
         "Capture screenshots of list, filters active, bulk selection, forms, dialogs",
         "Visually review screenshots",
         "Fix any issues"

From 9171ae44960b10c1d0801d8a24afd7fb108a3cc6 Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Thu, 19 Mar 2026 12:14:41 +0800
Subject: [PATCH 12/17] fix: add non-negotiable refinement gate to prevent
 skipping
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Refinement subagents were being skipped because there was no blocking
gate — unlike screenshots which have concrete "check exists → BLOCK"
enforcement. Added REFINEMENT GATE with same enforcement pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 SKILL.md | 57 ++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 49 insertions(+), 8 deletions(-)

diff --git a/SKILL.md b/SKILL.md
index a994587..5375449 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -187,8 +187,8 @@ This is the main workflow. It runs ALL remaining features to completion without
 
 **You MUST keep looping until EVERY feature in `feature_list.json` has `"passes": true`. Do NOT stop after one feature. Do NOT stop after two features. Do NOT stop to report progress to the user. Do NOT ask the human what to do next. The human may be asleep.**
 
-**After EACH subagent completes, you MUST immediately launch the NEXT subagent for the next incomplete feature. The ONLY acceptable reasons to stop are:**
-1. **ALL features have `"passes": true`**
+**After EACH subagent completes, you MUST: verify → refine → then launch the NEXT subagent. The refinement step is NOT optional — it is what makes features delightful instead of just functional. The ONLY acceptable reasons to stop are:**
+1. **ALL features have `"passes": true` AND all refinements are committed**
 2. **A truly unrecoverable error** (hardware failure, missing credentials that cannot be worked around)
 
 **Stopping to "report back" or "check in" with the user is a VIOLATION of this workflow. The user explicitly chose autonomous execution. KEEP GOING.**
@@ -222,8 +222,12 @@ features_completed_this_session = 0
 WHILE there are features with "passes": false in feature_list.json:
     1. Read feature_list.json to find next incomplete feature (highest priority first)
     2. Launch a SUBAGENT to implement, test, verify, and commit
-    3. After subagent completes, VERIFY output quality (see below)
-    4. Launch a REFINEMENT SUBAGENT to polish the feature (see Refinement Phase below)
+    3. After subagent completes, VERIFY output quality (see Verification Gates below)
+    4. ⚠️ REFINEMENT GATE (NON-NEGOTIABLE — see Refinement Phase below):
+       a. Launch a REFINEMENT SUBAGENT to polish UX and code quality
+       b. BLOCK until refinement subagent completes and commits
+       c. Verify refinement report exists: ls specs/{scope}/refinements/feature-{id}-refinement.md
+       d. If report missing → the refinement was SKIPPED → launch another refinement subagent
     5. features_completed_this_session++
     6. If features_completed_this_session % 5 == 0: run STANDARDS AUDIT (see below)
     7. CONTINUE to next feature — do NOT stop
@@ -232,6 +236,15 @@ END WHILE
 Run FINAL STANDARDS AUDIT before ending session
 ```
 
+**⚠️ WHY REFINEMENT IS NON-NEGOTIABLE ⚠️**
+
+Implementation subagents build features that *work*. Refinement subagents make features *delightful*. Without the refinement pass:
+- UX issues (poor spacing, weak hierarchy, missing micro-interactions) ship uncaught
+- Code smells (duplication, poor naming, tangled logic) accumulate across features
+- The "second pair of eyes" benefit is lost — self-review has blind spots
+
+**The refinement subagent is the quality difference between "it works" and "users love it".** Skipping it to save time is a false economy — it produces mediocre output that requires rework.
+
 ### Launching Feature Subagents (Claude Code)
 
 For each feature, use the **Agent tool** to launch a subagent. This keeps each feature's work isolated and prevents context window overflow.
@@ -451,7 +464,27 @@ The subagent handles implementation, testing, verification, and committing. The
    - Check edge cases (empty, null, duplicate) are tested
 
 4. If the subagent failed to complete, launch another subagent to fix and finish.
-5. **Launch REFINEMENT SUBAGENT** — See "Refinement Phase" below. This polishes the feature's UX and code quality before moving on.
+
+5. **⚠️ REFINEMENT GATE (NON-NEGOTIABLE — same enforcement level as Screenshot Gate) ⚠️**
+
+   Refinement is what turns "working code" into "delightful product". It MUST happen for EVERY feature, including infrastructure features. Do NOT skip it. Do NOT defer it. Do NOT rationalize skipping it ("it's just scaffolding", "looks good enough", "I'll refine later").
+
+   **The parent agent MUST execute these steps — no exceptions:**
+
+   a. **Launch a REFINEMENT SUBAGENT** using the refinement prompt template below. Wait for it to complete.
+   b. **Verify refinement report exists:**
+      ```bash
+      ls specs/{scope}/refinements/feature-{id}-refinement.md
+      ```
+   c. **If report is missing: BLOCK.** Launch the refinement subagent again. The feature is NOT done without its refinement pass.
+   d. **Verify a refinement commit exists:**
+      ```bash
+      git log --oneline -1 | grep "refine:"
+      ```
+   e. **If no refinement commit: BLOCK.** The refinement subagent failed to commit. Launch again.
+
+   **Why this gate exists:** In practice, the parent agent tends to skip refinement to "save time" and move to the next feature faster. This produces features that work but feel unpolished — generic spacing, weak visual hierarchy, no micro-interactions, duplicated code. The refinement pass is a deliberate "second pair of eyes" that catches what the implementation subagent missed. It is the single biggest quality lever in the workflow.
+
 6. **Loop back IMMEDIATELY** — pick the next incomplete feature and launch a new subagent RIGHT NOW. Do NOT stop, do NOT report to the user, do NOT wait for instructions. KEEP GOING until ALL features pass.
 
 ### Refinement Phase (After Each Feature)
@@ -633,6 +666,8 @@ Since the human may be asleep, follow these rules for autonomous decisions:
 | Unclear file structure | Follow existing project conventions |
 | **Web/mobile:** Unclear UI design | Follow references/web/frontend-design.md |
 | **Web/mobile:** UI looks generic/plain | Add visual polish per references/web/ux-standards.md |
+| **All types:** Tempted to skip refinement | NEVER skip — launch the refinement subagent. It's what makes features delightful. |
+| **All types:** Refinement subagent didn't commit | Launch it again — refinement report + commit are mandatory gates |
 | **Web/mobile:** Subagent skipped screenshots | Launch follow-up subagent to add them |
 | **Web full-stack:** Frontend shows loading forever | Check CORS headers and route prefix — see `references/verification/web-verification.md` Integration Smoke Test |
 | **Web full-stack:** curl works but browser doesn't | CORS issue — add `Access-Control-Allow-Origin` middleware to backend |
@@ -677,20 +712,26 @@ Before ending:
 - NEVER wait for human approval
 - NEVER stop to "report progress" or "check in" — the user can see commits in git log
 - NEVER output a summary and wait — immediately launch the next subagent
-- After each subagent completes: verify → refine → launch next subagent. That's it. No pausing.
+- After each subagent completes: **verify → REFINE → launch next subagent**. That's it. No pausing. No skipping refinement.
+- **The sequence is: implement → verify → REFINE → next feature. All three steps are mandatory. Skipping refinement is as wrong as skipping verification.**
 - Make reasonable decisions based on existing patterns
 - If blocked, try alternative approaches before giving up
 - Keep working until ALL features are complete
 - The continue workflow is a LOOP, not a single step. You are the loop controller.
 
-### Refinement Enforcement
-- Every completed feature MUST go through the refinement phase before moving to the next feature
+### Refinement Enforcement (NON-NEGOTIABLE — same level as Verification Enforcement)
+- **Every completed feature MUST go through the refinement phase before moving to the next feature — NO EXCEPTIONS, including infrastructure features**
 - The refinement subagent is separate from the implementation subagent — fresh context enables better evaluation
 - Refinement MUST NOT add new functionality — it only improves what exists
 - The refinement report (`specs/{scope}/refinements/feature-{id}-refinement.md`) MUST always be written, even if no code changes are made
 - Refinement commits use the prefix `refine:` not `feat:`
 - For web/mobile: UX refinement should think divergently — not just check for bugs, but imagine better ways to present information
 - For all types: code refinement should focus on abstraction, testability, and maintainability
+- **GATE CHECK**: After the refinement subagent completes, the parent MUST verify:
+  1. `specs/{scope}/refinements/feature-{id}-refinement.md` exists (the report)
+  2. `git log --oneline -1` shows a `refine:` commit
+  3. If either is missing, launch the refinement subagent again — do NOT proceed to the next feature
+- **Common failure mode**: The parent agent skips refinement to "move faster". This is explicitly forbidden. The refinement pass is what separates mediocre output from delightful output. It catches UX issues (spacing, hierarchy, micro-interactions) and code issues (duplication, naming, complexity) that the implementation subagent is blind to because it was focused on making things work.
 
 ---
 

From abebebd9f8990aa546b151fa9cef593f69142e53 Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Thu, 19 Mar 2026 12:19:55 +0800
Subject: [PATCH 13/17] refactor: simplify SKILL.md from 761 to 200 lines
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Extracted subagent prompt templates to references/templates/ and
removed duplicated rules that were restated 3-4 times across sections.
The core loop is now clear: implement → verify → refine → next.

No concepts lost — all moved to reference files or condensed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 SKILL.md                                    | 819 +++-----------------
 references/templates/audit-subagent.md      |  21 +
 references/templates/feature-subagent.md    | 123 +++
 references/templates/refinement-subagent.md | 107 +++
 4 files changed, 380 insertions(+), 690 deletions(-)
 create mode 100644 references/templates/audit-subagent.md
 create mode 100644 references/templates/feature-subagent.md
 create mode 100644 references/templates/refinement-subagent.md

diff --git a/SKILL.md b/SKILL.md
index 5375449..aee653c 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -5,757 +5,196 @@ description: Manage long-running AI agent development projects with incremental
 
 # Iterative Development Workflow
 
-This skill provides a complete workflow for AI agents working on long-running development projects across multiple sessions. It ensures **incremental, reliable progress** with proper handoffs between sessions. It works with any project type — web apps, APIs, CLI tools, libraries, data pipelines, and mobile apps.
+Autonomous, incremental development with quality gates. One feature at a time. Implement → verify → refine → next.
 
-## Core Principles
+## Core Loop
 
-1. **Incremental progress** — Work on ONE feature at a time. Finish, test, and commit before moving on.
-2. **Feature list is sacred** — `feature_list.json` is the single source of truth. See `references/core/feature-list-format.md` for rules.
-3. **Git discipline** — Commit after every completed feature. Never leave uncommitted work.
-4. **Clean handoffs** — Every session ends meeting `references/core/session-handoff-standards.md`.
-5. **Test before build** — Verify existing features work before implementing new ones.
-6. **Autonomous execution** — Make all decisions yourself. Never stop to ask the human. The human may be asleep.
-7. **Subagent per feature** — Each feature is implemented in its own subagent for isolation and parallelism safety.
-8. **Refactor and unit test** — Actively extract logic into testable modules. See `references/core/code-quality.md`.
-9. **Verification is non-negotiable** — Every feature MUST be verified using the strategy for its project type. See `references/verification/`.
-10. **Standards are auditable** — Quality standards live in reference docs and are systematically verified, not just aspirational checklists.
-
-## Project Types
-
-The skill adapts its verification strategy and applicable standards based on project type. The type is declared in `feature_list.json` or auto-detected during scope init.
-
-| Type | Verification Strategy | Extra Standards |
-|------|----------------------|-----------------|
-| **web** | E2E screenshots + visual review (Playwright) | `web/ux-standards.md`, `web/frontend-design.md` |
-| **api** | Integration tests, endpoint validation, response schemas | — |
-| **cli** | Command execution tests, output validation, exit codes | — |
-| **library** | Unit tests, public API validation, type checking | — |
-| **data** | Transformation tests, schema validation, data quality checks | — |
-| **mobile** | E2E screenshots + visual review (Detox/XCTest/Flutter) | `web/ux-standards.md` (adapted) |
-
-## Standards Documents
-
-All verifiable quality standards are extracted into reference docs. These are used both as guidance during implementation and as audit targets for systematic verification.
-
-### Core Standards (all project types)
-
-| Document | What it covers |
-|----------|---------------|
-| `references/core/code-quality.md` | File organization, testable architecture, unit testing, no duplication |
-| `references/core/gitignore-standards.md` | Files that must never be committed |
-| `references/core/feature-list-format.md` | Feature list structure, critical rules, priority order |
-| `references/core/session-handoff-standards.md` | Clean codebase, git state, progress tracking — verified at session end |
-
-### Web-Specific Standards (type: web, mobile)
+```
+FOR each feature (highest priority first):
+    1. IMPLEMENT  — launch subagent to build, test, and commit
+    2. VERIFY     — parent checks: commit exists, screenshots exist (web), tests prove outcomes
+    3. REFINE     — launch subagent to polish UX + code quality, write report, commit
+    4. NEXT       — immediately proceed to next feature
+```
 
-| Document | What it covers |
-|----------|---------------|
-| `references/web/ux-standards.md` | Loading/empty/error states, responsive design, accessibility, forms, tables, navigation |
-| `references/web/frontend-design.md` | Typography, color, spatial composition, micro-interactions, anti-patterns |
+**All three steps are mandatory. Skipping refinement is as wrong as skipping verification.**
 
-### Verification Strategies (one per project type)
+## Principles
 
-| Document | For type |
-|----------|----------|
-| `references/verification/web-verification.md` | web |
-| `references/verification/api-verification.md` | api |
-| `references/verification/cli-verification.md` | cli |
-| `references/verification/library-verification.md` | library |
-| `references/verification/data-verification.md` | data |
-| `references/verification/mobile-verification.md` | mobile |
+1. ONE feature at a time — finish, test, commit before moving on
+2. `feature_list.json` is the single source of truth — see `references/core/feature-list-format.md`
+3. Git commit after every feature and every refinement
+4. Autonomous execution — never stop to ask the human, the human may be asleep
+5. Subagent per feature — isolation prevents context overflow
+6. Verification is non-negotiable — every feature proven working per project type
+7. Refinement is non-negotiable — every feature polished for delight, not just function
+8. Standards are auditable — quality lives in reference docs, verified systematically
 
-## When to Use Each Workflow
+## Project Types
 
-| Workflow | Use When |
-|----------|----------|
-| **init-scope** | Starting a new scope, switching scopes, or setting up project structure |
-| **continue** | Every session after init — picking up work, implementing ALL remaining features, and verifying each |
+| Type | Verification | Extra Standards |
+|------|-------------|-----------------|
+| **web** | Playwright E2E + screenshots | `web/ux-standards.md`, `web/frontend-design.md` |
+| **api** | Integration tests + endpoint validation | — |
+| **cli** | Command execution + output validation | — |
+| **library** | Unit tests + public API validation | — |
+| **data** | Transformation tests + data quality | — |
+| **mobile** | Mobile E2E + screenshots | `web/ux-standards.md` (adapted) |
 
 ---
 
 ## Workflow: Initialize Scope
 
-Use this to create a new development scope or switch between existing scopes.
-
-### Concepts
-
-- **Scope**: A focused set of features (e.g., "auth", "video-editor", "phase-2")
-- **Active Scope**: Currently active scope stored in `.active-scope`
-- **Scope Files**: `specs/{scope}/spec.md` and `specs/{scope}/feature_list.json`
-- **Project Type**: Declared in `feature_list.json` — determines verification strategy and applicable standards
-
 ### Directory Structure
-
 ```
 project-root/
-├── specs/
-│   ├── auth/
-│   │   ├── spec.md
-│   │   ├── feature_list.json
-│   │   ├── progress.txt
-│   │   ├── screenshots/
-│   │   └── refinements/
-│   └── video-editor/
-│       ├── spec.md
-│       ├── feature_list.json
-│       ├── progress.txt
-│       ├── screenshots/
-│       └── refinements/
+├── specs/{scope}/
+│   ├── spec.md, feature_list.json, progress.txt
+│   ├── screenshots/
+│   └── refinements/
 ├── .active-scope
-├── spec.md              # Symlink to active scope
-├── feature_list.json    # Symlink to active scope
-├── progress.txt         # Symlink to active scope
+├── spec.md → specs/{scope}/spec.md        (symlink)
+├── feature_list.json → specs/{scope}/...  (symlink)
+├── progress.txt → specs/{scope}/...       (symlink)
 └── init.sh
 ```
 
 ### Steps
 
-1. **Check current state**
-   ```bash
-   ls -la specs/ 2>/dev/null || echo "No scopes yet"
-   cat .active-scope 2>/dev/null || echo "No active scope"
-   ```
-
-2. **Create new scope** (if needed)
-   ```bash
-   mkdir -p specs/auth/{screenshots,refinements}
-   # Create specs/auth/spec.md with specification
-   ```
-
-3. **Switch to scope**
-   ```bash
-   echo "auth" > .active-scope
-   ln -sf specs/auth/spec.md spec.md
-   ln -sf specs/auth/feature_list.json feature_list.json
-   ln -sf specs/auth/progress.txt progress.txt
-   ```
-
-4. **Determine project type** — based on how users interact with the deliverable:
-   - What does the user interact with? **Browser** → `web`. **Terminal** → `cli`. **Import/call** → `library`. **HTTP requests** → `api`. **Phone/tablet** → `mobile`. **Data outputs** → `data`.
-   - Confirm by examining the codebase structure (e.g., frontend frameworks suggest `web`, CLI entry points suggest `cli`, no main entry point suggests `library`)
-   - If unclear, default to the most fitting type based on spec.md
-   - **Full-stack detection:** If the project has both a backend (Go, Node, Python, etc.) and a frontend (React, Vue, etc.), features MUST be structured as **vertical slices** — see `references/core/feature-list-format.md` for the "Vertical Slices" rule
-
-5. **Create feature list** — choose the right method:
-
-   **If scope references a constitution / standards document** (e.g., "align with AGENTS.md", "refactor to follow standards"):
-   Use the **Constitution Audit Workflow** from `references/core/constitution-audit.md`. This is a multi-subagent process:
-   - Split the reference document into sections (~200 lines each)
-   - Launch parallel subagents to extract EVERY requirement from each section (read actual text, not summaries)
-   - Launch parallel subagents to verify each requirement against the actual codebase
-   - Generate features ONLY from verified violations
-   - This is NON-NEGOTIABLE for compliance scopes — ad-hoc auditing misses requirements
-
-   **If scope is new feature development** (e.g., "build a PIM system", "add auth"):
-   Use the standard process from `references/core/feature-list-format.md`
-
-   **Important:** Include the `"type"` field in feature_list.json (see feature-list-format.md).
-
-   **CRITICAL — Vertical Slices for Full-Stack Projects (NON-NEGOTIABLE):**
-   For full-stack projects (`web` type with both backend and frontend), each domain feature MUST be a **vertical slice** that implements backend AND frontend together in one feature. Do NOT split features by technology layer (e.g., "Backend: category CRUD" then "Frontend: category pages"). Instead, each feature delivers a complete user journey through the entire stack: model → service → API → UI → E2E test. See `references/core/feature-list-format.md` for the "Vertical Slices" rule, examples, and anti-patterns.
-
-   **CRITICAL — Self-Contained Features (NON-NEGOTIABLE):**
-   Every feature MUST include its own test and verification steps. NEVER create separate "testing" or "verification" features (e.g., "Write integration tests", "Add E2E tests for all pages"). Each feature's `steps` array must contain both implementation AND verification steps so the feature can be independently verified when completed. See `references/core/feature-list-format.md` for the "Self-Contained Features" rule and examples.
-
-   **CRITICAL — Verification Steps for UI Features (web/mobile — NON-NEGOTIABLE):**
-   For `web` and `mobile` project types, every feature that produces or modifies UI MUST include **interaction test and screenshot** steps in its `steps` array. These are NOT optional and MUST NOT be deferred to a separate feature.
-
-   **Outcome-proving tests (interaction, integration, unit) are the PRIMARY verification.** Tests must perform real user actions and verify observable outcomes — they prove the feature actually works. **Screenshots are SECONDARY** — they verify visual quality and catch layout/styling issues that interaction tests cannot detect. Both are required.
-
-   Every UI feature's `steps` array MUST end with these steps (adapted to the feature):
-   ```
-   "Write interaction tests: Playwright tests that perform user actions (click, fill, submit, navigate) and verify outcomes (data appears, page navigates, state changes)",
-   "Capture screenshots: add fullPage screenshots at key states (list view, empty state, form, after action)",
-   "Run Playwright tests and verify all pass and screenshots are generated in specs/{scope}/screenshots/",
-   "Visually review each screenshot: verify layout, spacing, hierarchy, loading/empty/error states, data display, and overall polish",
-   "Fix any issues found (broken behavior or visual problems) and re-run until quality is acceptable"
-   ```
-
-   **How to determine if a feature is a UI feature:** If the feature creates or modifies files in `src/routes/`, `src/components/`, `src/features/`, or any file that renders user-visible HTML/JSX, it is a UI feature and MUST have screenshot steps. Backend-only features (services, models, API endpoints without frontend) do NOT need screenshot steps.
-
-6. **Create/update init.sh** — see `references/core/init-script-template.md`
-
-7. **Commit and update progress log**
+1. **Check state**: `ls specs/ && cat .active-scope`
+2. **Create scope**: `mkdir -p specs/{scope}/{screenshots,refinements}`, write `spec.md`
+3. **Switch**: `echo "{scope}" > .active-scope`, create symlinks
+4. **Determine project type**: Browser→web, Terminal→cli, Import→library, HTTP→api, Phone→mobile, Data→data
+5. **Create feature list** — two methods:
+   - **New features**: Follow `references/core/feature-list-format.md`
+   - **Constitution/standards alignment**: Follow `references/core/constitution-audit.md`
+
+   **Critical rules for features:**
+   - Outcome-oriented (what user can DO, not what components exist)
+   - Full-stack vertical slices (backend + frontend together) — see feature-list-format.md
+   - Self-contained (each feature includes its own tests — no separate "testing" features)
+   - UI features MUST include screenshot + interaction test steps
+   - Include `"type"` field in feature_list.json
+6. **Create init.sh** — see `references/core/init-script-template.md`
+7. **Commit**
 
 ---
 
-## Workflow: Continue Session (Autonomous Feature Loop)
-
-This is the main workflow. It runs ALL remaining features to completion without stopping.
-
-**⚠️ CRITICAL NON-STOP RULE (NON-NEGOTIABLE) ⚠️**
-
-**You MUST keep looping until EVERY feature in `feature_list.json` has `"passes": true`. Do NOT stop after one feature. Do NOT stop after two features. Do NOT stop to report progress to the user. Do NOT ask the human what to do next. The human may be asleep.**
-
-**After EACH subagent completes, you MUST: verify → refine → then launch the NEXT subagent. The refinement step is NOT optional — it is what makes features delightful instead of just functional. The ONLY acceptable reasons to stop are:**
-1. **ALL features have `"passes": true` AND all refinements are committed**
-2. **A truly unrecoverable error** (hardware failure, missing credentials that cannot be worked around)
-
-**Stopping to "report back" or "check in" with the user is a VIOLATION of this workflow. The user explicitly chose autonomous execution. KEEP GOING.**
-
-### Session Startup Sequence
-
-1. **Get bearings**
-   ```bash
-   pwd
-   cat progress.txt
-   cat feature_list.json
-   git log --oneline -20
-   ```
-
-2. **Determine project type** — read the `"type"` field from `feature_list.json`
-
-3. **Start environment**
-   ```bash
-   bash init.sh
-   ```
-
-4. **Verify existing features** — Run all unit tests (fast) and only the tests for features completed in previous sessions (not this session's new work). Skip tests for features not yet implemented.
-
-### Autonomous Feature Loop
+## Workflow: Continue Session
 
-After startup, enter the **feature loop**. This loop runs until ALL features pass:
+### Startup
 
+```bash
+pwd && cat progress.txt && cat feature_list.json && git log --oneline -20
+bash init.sh
 ```
-features_completed_this_session = 0
-
-WHILE there are features with "passes": false in feature_list.json:
-    1. Read feature_list.json to find next incomplete feature (highest priority first)
-    2. Launch a SUBAGENT to implement, test, verify, and commit
-    3. After subagent completes, VERIFY output quality (see Verification Gates below)
-    4. ⚠️ REFINEMENT GATE (NON-NEGOTIABLE — see Refinement Phase below):
-       a. Launch a REFINEMENT SUBAGENT to polish UX and code quality
-       b. BLOCK until refinement subagent completes and commits
-       c. Verify refinement report exists: ls specs/{scope}/refinements/feature-{id}-refinement.md
-       d. If report missing → the refinement was SKIPPED → launch another refinement subagent
-    5. features_completed_this_session++
-    6. If features_completed_this_session % 5 == 0: run STANDARDS AUDIT (see below)
-    7. CONTINUE to next feature — do NOT stop
-END WHILE
-
-Run FINAL STANDARDS AUDIT before ending session
-```
-
-**⚠️ WHY REFINEMENT IS NON-NEGOTIABLE ⚠️**
-
-Implementation subagents build features that *work*. Refinement subagents make features *delightful*. Without the refinement pass:
-- UX issues (poor spacing, weak hierarchy, missing micro-interactions) ship uncaught
-- Code smells (duplication, poor naming, tangled logic) accumulate across features
-- The "second pair of eyes" benefit is lost — self-review has blind spots
-
-**The refinement subagent is the quality difference between "it works" and "users love it".** Skipping it to save time is a false economy — it produces mediocre output that requires rework.
-
-### Launching Feature Subagents (Claude Code)
-
-For each feature, use the **Agent tool** to launch a subagent. This keeps each feature's work isolated and prevents context window overflow.
-
-**IMPORTANT — Reference doc paths:** The `references/` directory lives inside this skill's install directory, NOT in the project. When building subagent prompts, you MUST resolve paths to absolute paths. Use: `{skill_base_dir}/references/...` where `{skill_base_dir}` is the "Base directory for this skill" shown at the top of this prompt. For example, if the skill base is `/Users/alice/.claude/skills/iterative-dev`, then the path is `/Users/alice/.claude/skills/iterative-dev/references/core/code-quality.md`.
-
-**Subagent prompt template:**
-
-```
-You are implementing a feature for a {type} project. Work autonomously — do NOT ask questions, make your best judgment on all decisions.
-
-## Project Context
-- Working directory: {pwd}
-- Active scope: {scope from .active-scope}
-- Project type: {type from feature_list.json}
-
-## Feature to Implement
-- ID: {id}
-- Description: {description}
-- Category: {category}
-- Priority: {priority}
-- Test Steps:
-{steps as bullet list}
-
-## Standards Documents
-Read these reference docs and follow them during implementation:
-- {skill_base_dir}/references/core/code-quality.md — Code organization, testability, unit testing rules
-- {skill_base_dir}/references/core/gitignore-standards.md — Files that must never be committed
-- {skill_base_dir}/references/verification/{type}-verification.md — Verification strategy for this project type
-{IF type == "web" or type == "mobile":}
-- {skill_base_dir}/references/web/ux-standards.md — UX quality requirements (loading/empty/error states, responsive, accessibility)
-- {skill_base_dir}/references/web/frontend-design.md — Visual design principles (typography, color, composition)
-{END IF}
-
-## Instructions
-
-### Phase 1: Implement
-1. Read the relevant source files to understand the current codebase
-2. Read the spec.md file for full project context
-3. Read the standards documents listed above (use the ABSOLUTE paths provided)
-4. Implement the feature following existing code patterns and the standards
-5. Make sure the implementation is complete and production-quality
-
-### Phase 2: Refactor & Unit Test
-Follow {skill_base_dir}/references/core/code-quality.md:
-6. Extract pure functions out of components and handlers
-7. Move business logic into testable utility/service modules
-8. Eliminate duplication — reuse existing helpers or extract new shared ones
-9. Write unit tests for all extracted logic. Run them until green.
-
-### Phase 3: Verification
-Follow {skill_base_dir}/references/verification/{type}-verification.md:
-10. Execute the verification strategy defined for {type} projects
-11. Run all relevant tests — fix until green
-12. MANDATORY: Perform the verification checks specified in the doc
-    Fix and re-run until all pass.
-
-{IF type == "web" or type == "mobile":}
-### Phase 3b: Screenshot Capture (NON-NEGOTIABLE for web/mobile)
-
-Interaction tests (Phase 3) are the PRIMARY verification that features work. Screenshots are SECONDARY but MANDATORY — they verify visual quality and catch layout/styling issues that interaction tests cannot detect. A feature without both interaction tests and screenshots is NOT fully verified.
-
-**Screenshot directory:** `{screenshots_dir}` (provided by parent agent — this is `{pwd}/specs/{scope}/screenshots/`, the scope-specific directory for all visual artifacts).
-
-13. Write or update a Playwright test file that captures screenshots at key states:
-    - Use `page.screenshot({ path: '{screenshots_dir}/feature-{id}-step{N}-{description}.png', fullPage: true })`
-    - Capture BEFORE action, AFTER action, error states, and empty states
-    - Every test MUST have at least one `page.screenshot()` call
-
-14. Run the Playwright tests:
-    ```bash
-    npx playwright test
-    ```
-
-15. Verify screenshots were generated:
-    ```bash
-    ls {screenshots_dir}/feature-{id}-*.png
-    ```
-    If no screenshots exist, the verification has FAILED. Fix and re-run.
-
-16. Use the Read tool to open and visually review EVERY screenshot. Check:
-    - Layout: content fits, no overflow/clipping, proper alignment
-    - Spacing: consistent padding/margins (4/8/16/24/32px scale)
-    - Visual hierarchy: important actions obvious, proper text size hierarchy
-    - States: loading skeleton/spinner, empty state (icon + message + CTA), error state
-    - Aesthetics: polished and intentional, cohesive colors, proper shadows/depth
-    - Data display: real data shown, numbers right-aligned in tables, status badges colored
-
-17. If screenshots reveal problems, fix the UI and re-capture until quality is acceptable.
-
-**Screenshot naming convention:** `feature-{id}-step{N}-{description}.png` (scope is encoded in the directory path `specs/{scope}/screenshots/`)
-Examples: `feature-9-step1-product-list.png`, `feature-9-step2-empty-state.png`
-{END IF}
-
-### Phase 4: Gitignore Review
-Follow {skill_base_dir}/references/core/gitignore-standards.md:
-18. Run `git status --short` and check every file against gitignore patterns
-19. Add any missing patterns to `.gitignore`, remove from tracking if needed
-
-### Phase 5: Commit
-20. Update feature_list.json — change "passes": false to "passes": true
-21. Update progress.txt with what was done and current feature pass count
-22. Commit all changes:
-    git add -A && git commit -m "feat: [description] — Implemented feature #[id]: [description]"
-
-## Key Rules
-- Follow existing code patterns and the standards documents
-- Keep changes focused on this feature only
-- Do not break other features
-- Make all decisions yourself, never ask for human input
-- EVERY feature must be verified per the verification strategy — no exceptions
-- BEFORE committing, review ALL files for .gitignore candidates
-{IF type == "web" or type == "mobile":}
-- SCREENSHOTS ARE NON-NEGOTIABLE — do not skip or defer them
-- If the app/server is not running for screenshots, start it (check init.sh or start manually)
-{END IF}
-{IF feature connects frontend to real backend API (replaces mocks, changes fetch config):}
-### Full-Stack Integration Verification (NON-NEGOTIABLE)
-This feature connects the frontend to a real backend. You MUST verify the connection works end-to-end:
-1. **Start both servers** — backend with a real database, frontend with VITE_API_BASE_URL pointing to backend
-2. **Verify route prefix** — `curl` the backend API at the URL the frontend will use (e.g., `/api/v1/...`). If 404, the route prefix is wrong. Code generators often omit the OpenAPI `servers.url` prefix — mount the handler under the correct prefix.
-3. **Verify CORS** — `curl -I -X OPTIONS` with an `Origin` header matching the frontend port. If no `Access-Control-Allow-Origin` header, add CORS middleware. This is the #1 reason frontends silently fail to load data.
-4. **Seed data and screenshot** — Seed 2-3 records, take Playwright screenshots of all pages, and verify they show REAL DATA (not loading skeletons or empty states).
-5. **Check browser console** — Run Playwright with console error capture. Any CORS or fetch errors mean the integration is broken.
-Do NOT mark this feature as passing based only on `tsc --noEmit`. TypeScript cannot catch CORS or route mismatches.
-{END IF}
-```
-
-**How to launch the subagent:**
-
-Use the Agent tool with `subagent_type: "general-purpose"`. Example:
-
-```
-Agent tool call:
-  description: "Implement feature #3"
-  prompt: [filled template above]
-```
-
-### After Each Subagent Completes
-
-The subagent handles implementation, testing, verification, and committing. The parent agent MUST verify:
-
-1. **Confirm commit** — `git log --oneline -1`
-2. **Confirm feature_list.json** — feature has `"passes": true`
-3. **Verify output quality (NON-NEGOTIABLE GATE)** — type-specific checks. You MUST run these checks. Do NOT skip them even if the subagent reported success.
-
-   **For `web` and `mobile` projects — SCREENSHOT GATE (NON-NEGOTIABLE):**
-
-   This gate MUST be executed for EVERY UI feature. It is the primary quality control for visual output. Skipping this gate means the feature is NOT verified.
-
-   **Determine `{screenshots_dir}`:** Screenshots are stored per-scope: `{pwd}/specs/{scope}/screenshots/`
-   - Read the active scope from `.active-scope`
-   - Resolve to absolute path: `{pwd}/specs/{scope}/screenshots/`
-   - You MUST pass this resolved absolute path as `{screenshots_dir}` when building subagent prompts.
-
-   a. **CHECK screenshots exist:**
-      ```bash
-      ls {screenshots_dir}/feature-{id}-*.png 2>/dev/null | wc -l
-      ```
-   b. **If count is 0: BLOCK.** The feature is NOT complete. Launch a follow-up subagent specifically to capture screenshots:
-      ```
-      Prompt: "You need to add screenshot capture for feature #{id} ({description}).
-      The feature is already implemented and committed. Your ONLY job is:
-      1. Start the dev server if not running (check with lsof, start with init.sh if needed)
-      2. Write/update a Playwright test that navigates to the feature and captures screenshots
-      3. Screenshots MUST be saved to: {screenshots_dir}/feature-{id}-step{N}-{description}.png
-      4. Run the test: npx playwright test
-      5. Verify screenshots exist: ls {screenshots_dir}/feature-{id}-*.png
-      6. Use the Read tool to visually review each screenshot
-      7. Commit the screenshots and test file"
-      ```
-   c. **If count > 0: SPOT-CHECK.** Use the Read tool to open one screenshot. Evaluate:
-      - Layout correct? Content fits, no overflow?
-      - Real data shown, not empty/broken?
-      - Polished appearance, not prototype-level?
-      - If quality is poor, launch a **polish subagent** to fix UI issues and recapture.
-   d. **OUTCOME VERIFICATION CHECK:** Verify the subagent wrote tests that **prove the feature works from the user's perspective** — not just screenshot-only tests. Tests must perform user actions (click, fill, submit, navigate) and verify outcomes (data appears, page navigates, state changes). If the feature has interactive elements and the only tests are screenshots, the feature is NOT verified. Launch a follow-up subagent to add interaction tests. See `references/verification/web-verification.md` Step 2.
-
-   **For `web` full-stack projects — INTEGRATION SMOKE TEST GATE (NON-NEGOTIABLE):**
 
-   This gate MUST be executed for ANY feature that connects the frontend to a real backend API (replacing mocks, changing fetch config, modifying backend routes/middleware). This is the **#1 source of silent failures** — TypeScript compiles clean but the app shows loading spinners forever because of CORS or route prefix issues.
+Verify existing features work before implementing new ones.
 
-   After the subagent commits, the parent agent MUST:
+### Feature Loop (NON-STOP until all features pass)
 
-   a. **Start both servers** (backend with real database, frontend pointing to backend)
-   b. **Verify backend routes respond** (not 404):
-      ```bash
-      curl -s http://localhost:{backend_port}/api/v1/{any_resource} | head -3
-      ```
-      If 404: route prefix mismatch. Code generators (ogen, openapi-generator) often register routes without the OpenAPI `servers.url` prefix. Fix by mounting the generated handler under `/api/v1` with `http.StripPrefix` or equivalent.
-   c. **Verify CORS headers**:
-      ```bash
-      curl -s -I -X OPTIONS http://localhost:{backend_port}/api/v1/{any_resource} \
-        -H 'Origin: http://localhost:{frontend_port}' | grep -i 'access-control'
-      ```
-      If missing: add CORS middleware to the backend. Without it, browsers silently block all frontend API requests.
-   d. **Seed test data** via API (at least 2-3 records)
-   e. **Run Playwright screenshots** against all major pages
-   f. **Verify screenshots show REAL DATA** — not loading skeletons, not empty states. If data is missing, diagnose using the common root causes table in `references/verification/web-verification.md`.
+**Never stop to report progress. Never ask the human. Keep going until done.**
 
-   If any check fails, launch a fix subagent before moving to the next feature.
+For each incomplete feature (highest priority first):
 
-   **For `api` projects:**
-   - Verify integration tests exist and pass
-   - Check that error cases are tested (not just happy paths)
+#### Step 1: IMPLEMENT
 
-   **For `cli` projects:**
-   - Run a quick smoke test: `./bin/{tool} --help` or equivalent
-   - Verify error cases are tested
+Read `references/templates/feature-subagent.md` for the full prompt template. Launch via Agent tool.
 
-   **For `library` projects:**
-   - Verify all tests pass (including race detection if applicable)
-   - Check public API surface hasn't accidentally expanded
+**Reference doc paths**: The `references/` directory is in THIS SKILL's install directory, not the project. Resolve to absolute paths using `{skill_base_dir}` shown at top of this prompt.
 
-   **For `data` projects:**
-   - Verify transformation tests exist and pass
-   - Check edge cases (empty, null, duplicate) are tested
+#### Step 2: VERIFY (parent agent — mandatory gates)
 
-4. If the subagent failed to complete, launch another subagent to fix and finish.
+After the implementation subagent completes:
 
-5. **⚠️ REFINEMENT GATE (NON-NEGOTIABLE — same enforcement level as Screenshot Gate) ⚠️**
+a. **Commit gate**: `git log --oneline -1` — confirm `feat:` commit exists
+b. **Feature list gate**: confirm `"passes": true` in feature_list.json
+c. **Type-specific gate**:
 
-   Refinement is what turns "working code" into "delightful product". It MUST happen for EVERY feature, including infrastructure features. Do NOT skip it. Do NOT defer it. Do NOT rationalize skipping it ("it's just scaffolding", "looks good enough", "I'll refine later").
+| Type | Gate |
+|------|------|
+| web/mobile | **Screenshot gate**: `ls specs/{scope}/screenshots/feature-{id}-*.png \| wc -l` — if 0, BLOCK and launch screenshot subagent. If >0, spot-check one with Read tool. **Outcome test gate**: verify tests perform user actions (not just screenshots). |
+| web full-stack | **Integration smoke test**: verify backend responds (not 404), verify CORS headers, verify screenshots show real data (not loading spinners). See `references/verification/web-verification.md`. |
+| api | Verify integration tests exist and cover error cases |
+| cli | Smoke test: `./bin/{tool} --help` |
+| library | All tests pass including race detection |
+| data | Transformation tests cover edge cases |
 
-   **The parent agent MUST execute these steps — no exceptions:**
+d. If any gate fails, launch a fix subagent before proceeding.
 
-   a. **Launch a REFINEMENT SUBAGENT** using the refinement prompt template below. Wait for it to complete.
-   b. **Verify refinement report exists:**
-      ```bash
-      ls specs/{scope}/refinements/feature-{id}-refinement.md
-      ```
-   c. **If report is missing: BLOCK.** Launch the refinement subagent again. The feature is NOT done without its refinement pass.
-   d. **Verify a refinement commit exists:**
-      ```bash
-      git log --oneline -1 | grep "refine:"
-      ```
-   e. **If no refinement commit: BLOCK.** The refinement subagent failed to commit. Launch again.
+#### Step 3: REFINE (mandatory — not optional)
 
-   **Why this gate exists:** In practice, the parent agent tends to skip refinement to "save time" and move to the next feature faster. This produces features that work but feel unpolished — generic spacing, weak visual hierarchy, no micro-interactions, duplicated code. The refinement pass is a deliberate "second pair of eyes" that catches what the implementation subagent missed. It is the single biggest quality lever in the workflow.
+Read `references/templates/refinement-subagent.md` for the full prompt template. Launch via Agent tool.
 
-6. **Loop back IMMEDIATELY** — pick the next incomplete feature and launch a new subagent RIGHT NOW. Do NOT stop, do NOT report to the user, do NOT wait for instructions. KEEP GOING until ALL features pass.
-
-### Refinement Phase (After Each Feature)
-
-After a feature passes verification and is committed, launch a **refinement subagent** to polish it. This is a separate subagent so it evaluates the feature with fresh context — a "second pair of eyes" pass.
-
-The refinement subagent writes its analysis to `specs/{scope}/refinements/feature-{id}-refinement.md` so the thinking process is traceable across sessions.
-
-**Refinement subagent prompt template:**
+**Why refinement exists**: Implementation subagents build features that *work*. Refinement subagents make features *delightful*. Without refinement, UX issues (spacing, hierarchy, micro-interactions) and code smells (duplication, naming, complexity) ship uncaught. It's the quality difference between "functional" and "users love it".
 
+**Refinement gate** (parent must verify after subagent completes):
+```bash
+# Report must exist
+ls specs/{scope}/refinements/feature-{id}-refinement.md
+# Commit must exist
+git log --oneline -1 | grep "refine:"
 ```
-You are refining a recently completed feature. The feature is already implemented, tested, verified, and committed. Your job is to polish and improve it — both the user experience and the code quality.
-
-## Project Context
-- Working directory: {pwd}
-- Active scope: {scope}
-- Project type: {type}
-- Feature just completed: #{id} — {description}
-- Screenshots directory: {screenshots_dir}
-- Refinement output: {pwd}/specs/{scope}/refinements/feature-{id}-refinement.md
-
-## Standards Documents
-Read these before starting:
-- {skill_base_dir}/references/core/code-quality.md
-{IF type == "web" or type == "mobile":}
-- {skill_base_dir}/references/web/ux-standards.md
-- {skill_base_dir}/references/web/frontend-design.md
-{END IF}
-
-## What Was Done
-Review the most recent commit to understand what was implemented:
-git log --oneline -1
-git diff HEAD~1 --name-only
-
-{IF type == "web" or type == "mobile":}
-## Part 1: UX/Visual Refinement
-
-Think divergently about how to make users LOVE this interface. Don't just check for bugs — imagine better ways to present the information and interactions.
-
-1. Use the Read tool to review ALL screenshots in {screenshots_dir}/ for this feature
-2. For each screen, evaluate from a first-time user's perspective:
-   - Is the purpose of this screen immediately obvious?
-   - Can the user figure out what to do without instructions?
-   - Does the visual hierarchy guide the eye to the most important action?
-   - Are transitions and state changes smooth and predictable?
-3. Think divergently about improvements — consider alternatives you haven't tried:
-   - Could the layout be reorganized for better flow or scannability?
-   - Would micro-interactions (hover effects, transitions, focus states) make it feel more responsive and alive?
-   - Is whitespace being used effectively to create breathing room and group related elements?
-   - Could typography be more expressive — size contrasts, weight variations, line heights?
-   - Are colors creating the right emotional tone? Could accent colors highlight key actions better?
-   - Are empty states, loading states, and error states not just functional but helpful and encouraging?
-   - Could icons, illustrations, or subtle visual cues improve comprehension?
-4. Research: look at how the standards documents suggest handling similar UI patterns. Are there recommendations you missed?
-5. Implement the most impactful improvements — prioritize changes that make the biggest difference to user understanding and delight
-6. Re-run Playwright tests and re-capture screenshots
-7. Visually verify the improvements look better than before
-{END IF}
-
-## Part 2: Code Quality Refinement
-
-Re-read all generated code with fresh eyes, looking for opportunities to make it more maintainable and testable.
-
-1. Read ALL files changed in the most recent commit: `git diff HEAD~1 --name-only`
-2. For each file, evaluate:
-   - **Abstraction**: Are there functions doing too many things? Should logic be extracted?
-   - **Testability**: Is business logic separated from framework/UI code? Could someone write a unit test for the core logic without setting up the whole framework?
-   - **Readability**: Would a new developer understand this code without extensive context? Are names clear and descriptive?
-   - **Duplication**: Is there repeated logic that should be a shared utility?
-   - **Simplicity**: Are there overly complex control flows that could be simplified? Deep nesting that could be flattened?
-3. Make concrete improvements — refactor, rename, extract, simplify
-4. Run all unit tests — ensure they still pass
-5. If you extracted new logic, write unit tests for it
-
-## Part 3: Write Refinement Report
-
-Write your analysis to `{pwd}/specs/{scope}/refinements/feature-{id}-refinement.md` with this structure:
-
-```markdown
-# Feature #{id} Refinement: {description}
-
-## UX Analysis (web/mobile only)
-- **Screenshots reviewed**: [list of screenshots]
-- **Issues found**: [what problems or opportunities were identified]
-- **Alternatives considered**: [what other approaches were thought about]
-- **Changes made**: [what was actually improved and why]
-- **Changes deferred**: [ideas noted for future consideration, if any]
-
-## Code Quality Analysis
-- **Files reviewed**: [list of files]
-- **Issues found**: [code smells, abstraction opportunities, naming issues]
-- **Refactoring done**: [what was changed and why]
-- **Test coverage**: [new tests added, if any]
-
-## Summary
-[1-2 sentence summary of the refinement pass]
-```
-
-## Commit
-If you made code or UI changes:
-git add -A && git commit -m "refine: polish feature #{id} — [summary of improvements]"
+If either is missing, launch the refinement subagent again. Do NOT proceed without refinement.
 
-If no code changes were warranted, still commit the refinement report:
-git add specs/{scope}/refinements/ && git commit -m "refine: review feature #{id} — no changes needed"
+#### Step 4: NEXT
 
-## Rules
-- This is a POLISH pass — do NOT add new functionality
-- Do NOT break existing tests
-- Keep changes focused on improving what exists
-- Think creatively about UX — the goal is to make users enjoy and understand the interface
-- Think critically about code — the goal is to make the codebase a pleasure to maintain
-- ALWAYS write the refinement report, even if no changes are made
-```
+Loop back immediately to the next incomplete feature. No pausing, no reporting.
 
 ### Periodic Standards Audit
 
-**When to run:** Every 5 completed features AND at session end (before final commit).
-
-This uses the same audit pattern as `references/core/constitution-audit.md`, but applied to the project's own standards documents. The audit catches issues that individual subagents missed — self-review has blind spots.
-
-**Audit process:**
-
-1. Determine which standards apply based on project type:
-   - **All types:** `core/code-quality.md`, `core/gitignore-standards.md`, `core/session-handoff-standards.md`
-   - **web/mobile only:** `web/ux-standards.md`, `web/frontend-design.md`
+**When**: Every 5 features AND at session end.
 
-2. For EACH applicable standards document, launch a **verification subagent** that:
-   - Reads the standards document
-   - Reads the code/files changed since the last audit (use `git diff --name-only HEAD~5` or similar)
-   - Checks each standard against the actual code
-   - Reports: COMPLIANT or VIOLATION with specific file and line
+For each applicable standards doc, launch audit subagent (see `references/templates/audit-subagent.md`). Fix violations before proceeding.
 
-3. Collect all violations across subagents
-
-4. If violations found:
-   - Group related violations into fix batches
-   - Launch a **fix subagent** for each batch
-   - Each fix subagent commits its changes
-   - Re-verify the fixed code
-
-5. Log audit results in `progress.txt`
-
-**Subagent prompt template for standards audit:**
-
-```
-You are auditing recently changed code against a project standards document.
+Applicable standards by type:
+- **All**: `core/code-quality.md`, `core/gitignore-standards.md`, `core/session-handoff-standards.md`
+- **web/mobile**: also `web/ux-standards.md`, `web/frontend-design.md`
 
-## Standards Document
-{paste the full content of the standards doc}
+### Session End
 
-## Files to Audit
-{list of files changed since last audit}
+Only end when ALL features have `"passes": true` and all refinements are committed, or a truly unrecoverable error occurs.
 
-## Instructions
-1. Read each file listed above
-2. For EACH standard in the document, check if the code complies
-3. Report findings as:
-   - COMPLIANT: {standard} — {brief evidence}
-   - VIOLATION: {standard} — {file}:{line} — {what's wrong} — {fix needed}
-4. Be thorough — check every standard, don't skip "obvious" ones
-```
+Before ending: final standards audit, run all tests, verify `references/core/session-handoff-standards.md`.
 
-### Decision Making Guidelines
+---
 
-Since the human may be asleep, follow these rules for autonomous decisions:
+## Decision Making (autonomous — human may be asleep)
 
 | Situation | Decision |
 |-----------|----------|
-| Ambiguous spec | Choose the simplest reasonable interpretation |
-| Multiple implementation approaches | Pick the one matching existing patterns |
-| Test is flaky | Add proper waits/retries, don't skip the test |
-| Feature seems too large | Break into sub-tasks within the subagent |
-| Dependency conflict | Use the version compatible with existing packages |
-| Build error | Read the error, fix it, rebuild |
-| Port conflict | Kill the conflicting process and restart |
-| Database issue | Reset/reseed the database |
-| Feature blocked by another | Skip to next feature, come back later |
-| Missing dependency | Install it |
-| Unclear file structure | Follow existing project conventions |
-| **Web/mobile:** Unclear UI design | Follow references/web/frontend-design.md |
-| **Web/mobile:** UI looks generic/plain | Add visual polish per references/web/ux-standards.md |
-| **All types:** Tempted to skip refinement | NEVER skip — launch the refinement subagent. It's what makes features delightful. |
-| **All types:** Refinement subagent didn't commit | Launch it again — refinement report + commit are mandatory gates |
-| **Web/mobile:** Subagent skipped screenshots | Launch follow-up subagent to add them |
-| **Web full-stack:** Frontend shows loading forever | Check CORS headers and route prefix — see `references/verification/web-verification.md` Integration Smoke Test |
-| **Web full-stack:** curl works but browser doesn't | CORS issue — add `Access-Control-Allow-Origin` middleware to backend |
-| **Web full-stack:** Backend returns 404 for /api/v1/... | Code generator omitted server URL prefix — mount handler under `/api/v1` |
-| **API:** Unclear response format | Follow existing endpoint patterns, use consistent error format |
-| **CLI:** Unclear output format | Match existing command output style |
-| **Library:** Unclear public API | Keep it minimal, expose only what's needed |
-
-### Session End
-
-Only end the session when:
-- **ALL features have `"passes": true`**, OR
-- A truly unrecoverable error occurs (hardware failure, missing credentials, etc.)
-
-Before ending:
-1. Run **final standards audit** (see Periodic Standards Audit above) — include `core/session-handoff-standards.md`
-2. Run all unit tests
-3. Run verification tests only for features completed in previous sessions (regression check)
-4. Verify codebase meets `references/core/session-handoff-standards.md`
-5. Commit any remaining changes
-
----
-
-## Critical Rules
-
-### Standards Enforcement
-- All quality standards live in `references/` docs within this skill's base directory — subagents MUST read them using absolute paths
-- **CRITICAL**: Reference doc paths are relative to THIS SKILL's install directory (shown as "Base directory for this skill" at the top of this prompt), NOT the project working directory. Always resolve to absolute paths before passing to subagents.
-- Standards are verified both during implementation (by subagent) AND periodically (by audit)
-- Audit violations MUST be fixed before session ends
-
-### Verification Enforcement (web/mobile projects — NON-NEGOTIABLE)
-- **Interaction tests proving user outcomes are PRIMARY verification** — every UI feature MUST have tests that perform user actions and verify results
-- Every UI feature MUST also have screenshots in `{screenshots_dir}/feature-{id}-*.png` — screenshots are SECONDARY for visual quality
-- `{screenshots_dir}` is `{pwd}/specs/{scope}/screenshots/` — screenshots are stored per-scope alongside other scope artifacts.
-- The parent agent MUST check for both interaction tests AND screenshots after EVERY subagent that implements a UI feature
-- If either is missing, the parent MUST launch a follow-up subagent — the feature is NOT done
-- The subagent prompt template includes inlined verification instructions so subagents know what to do without needing to find external docs
-
-### Autonomous Operation (NON-NEGOTIABLE)
-- NEVER stop to ask the human a question
-- NEVER wait for human approval
-- NEVER stop to "report progress" or "check in" — the user can see commits in git log
-- NEVER output a summary and wait — immediately launch the next subagent
-- After each subagent completes: **verify → REFINE → launch next subagent**. That's it. No pausing. No skipping refinement.
-- **The sequence is: implement → verify → REFINE → next feature. All three steps are mandatory. Skipping refinement is as wrong as skipping verification.**
-- Make reasonable decisions based on existing patterns
-- If blocked, try alternative approaches before giving up
-- Keep working until ALL features are complete
-- The continue workflow is a LOOP, not a single step. You are the loop controller.
-
-### Refinement Enforcement (NON-NEGOTIABLE — same level as Verification Enforcement)
-- **Every completed feature MUST go through the refinement phase before moving to the next feature — NO EXCEPTIONS, including infrastructure features**
-- The refinement subagent is separate from the implementation subagent — fresh context enables better evaluation
-- Refinement MUST NOT add new functionality — it only improves what exists
-- The refinement report (`specs/{scope}/refinements/feature-{id}-refinement.md`) MUST always be written, even if no code changes are made
-- Refinement commits use the prefix `refine:` not `feat:`
-- For web/mobile: UX refinement should think divergently — not just check for bugs, but imagine better ways to present information
-- For all types: code refinement should focus on abstraction, testability, and maintainability
-- **GATE CHECK**: After the refinement subagent completes, the parent MUST verify:
-  1. `specs/{scope}/refinements/feature-{id}-refinement.md` exists (the report)
-  2. `git log --oneline -1` shows a `refine:` commit
-  3. If either is missing, launch the refinement subagent again — do NOT proceed to the next feature
-- **Common failure mode**: The parent agent skips refinement to "move faster". This is explicitly forbidden. The refinement pass is what separates mediocre output from delightful output. It catches UX issues (spacing, hierarchy, micro-interactions) and code issues (duplication, naming, complexity) that the implementation subagent is blind to because it was focused on making things work.
+| Ambiguous spec | Simplest reasonable interpretation |
+| Multiple approaches | Match existing patterns |
+| Test is flaky | Fix with proper waits, don't skip |
+| Feature too large | Break into sub-tasks within subagent |
+| Build/dependency error | Read error, fix, rebuild |
+| Port conflict | Kill conflicting process, restart |
+| Feature blocked | Skip to next, come back later |
+| Tempted to skip refinement | NEVER skip — launch it |
+| Web: frontend loads forever | Check CORS + route prefix |
+| Web: curl works, browser doesn't | CORS middleware missing |
+| Web: backend 404 on /api/v1/ | Mount handler under correct prefix |
 
 ---
 
 ## Reference Files
 
-All standards, templates, and detailed processes:
-
-### Core (all project types)
-- `references/core/code-quality.md` — Code organization, testability, and unit testing standards
-- `references/core/gitignore-standards.md` — Gitignore patterns and review process
-- `references/core/feature-list-format.md` — Feature list structure, critical rules, priority order
-- `references/core/session-handoff-standards.md` — Clean codebase, git state, progress tracking
-- `references/core/init-script-template.md` — init.sh template
-- `references/core/continue-workflow.md` — Full continue workflow details
-- `references/core/constitution-audit.md` — Systematic audit workflow for compliance/alignment scopes
-
-### Web-specific (type: web, mobile)
-- `references/web/ux-standards.md` — UX quality standards and checklist
-- `references/web/frontend-design.md` — Design principles for visual quality
-
-### Verification strategies (one per project type)
-- `references/verification/web-verification.md` — Playwright E2E + screenshots
-- `references/verification/api-verification.md` — Integration tests + endpoint validation
-- `references/verification/cli-verification.md` — Command execution + output validation
-- `references/verification/library-verification.md` — Unit tests + public API validation
-- `references/verification/data-verification.md` — Transformation tests + data quality
-- `references/verification/mobile-verification.md` — Mobile E2E + screenshots
+### Templates (subagent prompts)
+- `references/templates/feature-subagent.md` — Implementation subagent prompt
+- `references/templates/refinement-subagent.md` — Refinement subagent prompt
+- `references/templates/audit-subagent.md` — Standards audit subagent prompt
+
+### Core Standards (all types)
+- `references/core/code-quality.md` — File organization, testability, unit testing
+- `references/core/gitignore-standards.md` — Files that must never be committed
+- `references/core/feature-list-format.md` — Feature list structure and rules
+- `references/core/session-handoff-standards.md` — Clean state at session end
+- `references/core/init-script-template.md` — init.sh templates by project type
+- `references/core/constitution-audit.md` — Audit workflow for compliance scopes
+
+### Web Standards (web/mobile)
+- `references/web/ux-standards.md` — Loading/empty/error states, responsive, accessibility
+- `references/web/frontend-design.md` — Typography, color, composition
+
+### Verification (one per type)
+- `references/verification/{web,api,cli,library,data,mobile}-verification.md`
diff --git a/references/templates/audit-subagent.md b/references/templates/audit-subagent.md
new file mode 100644
index 0000000..a95cfa1
--- /dev/null
+++ b/references/templates/audit-subagent.md
@@ -0,0 +1,21 @@
+# Standards Audit Subagent Prompt Template
+
+Fill in `{variables}` before passing to the Agent tool.
+
+---
+
+You are auditing recently changed code against a project standards document.
+
+## Standards Document
+{paste the full content of the standards doc}
+
+## Files to Audit
+{list of files changed since last audit}
+
+## Instructions
+1. Read each file listed above
+2. For EACH standard in the document, check if the code complies
+3. Report findings as:
+   - COMPLIANT: {standard} — {brief evidence}
+   - VIOLATION: {standard} — {file}:{line} — {what's wrong} — {fix needed}
+4. Be thorough — check every standard, don't skip "obvious" ones
diff --git a/references/templates/feature-subagent.md b/references/templates/feature-subagent.md
new file mode 100644
index 0000000..1cfc204
--- /dev/null
+++ b/references/templates/feature-subagent.md
@@ -0,0 +1,123 @@
+# Feature Implementation Subagent Prompt Template
+
+Fill in `{variables}` and evaluate `{IF}` blocks before passing to the Agent tool.
+
+---
+
+You are implementing a feature for a {type} project. Work autonomously — do NOT ask questions, make your best judgment on all decisions.
+
+## Project Context
+- Working directory: {pwd}
+- Active scope: {scope from .active-scope}
+- Project type: {type from feature_list.json}
+
+## Feature to Implement
+- ID: {id}
+- Description: {description}
+- Category: {category}
+- Priority: {priority}
+- Test Steps:
+{steps as bullet list}
+
+## Standards Documents
+Read these reference docs and follow them during implementation:
+- {skill_base_dir}/references/core/code-quality.md — Code organization, testability, unit testing rules
+- {skill_base_dir}/references/core/gitignore-standards.md — Files that must never be committed
+- {skill_base_dir}/references/verification/{type}-verification.md — Verification strategy for this project type
+{IF type == "web" or type == "mobile":}
+- {skill_base_dir}/references/web/ux-standards.md — UX quality requirements (loading/empty/error states, responsive, accessibility)
+- {skill_base_dir}/references/web/frontend-design.md — Visual design principles (typography, color, composition)
+{END IF}
+
+## Instructions
+
+### Phase 1: Implement
+1. Read the relevant source files to understand the current codebase
+2. Read the spec.md file for full project context
+3. Read the standards documents listed above (use the ABSOLUTE paths provided)
+4. Implement the feature following existing code patterns and the standards
+5. Make sure the implementation is complete and production-quality
+
+### Phase 2: Refactor & Unit Test
+Follow {skill_base_dir}/references/core/code-quality.md:
+6. Extract pure functions out of components and handlers
+7. Move business logic into testable utility/service modules
+8. Eliminate duplication — reuse existing helpers or extract new shared ones
+9. Write unit tests for all extracted logic. Run them until green.
+
+### Phase 3: Verification
+Follow {skill_base_dir}/references/verification/{type}-verification.md:
+10. Execute the verification strategy defined for {type} projects
+11. Run all relevant tests — fix until green
+12. MANDATORY: Perform the verification checks specified in the doc
+    Fix and re-run until all pass.
+
+{IF type == "web" or type == "mobile":}
+### Phase 3b: Screenshot Capture (NON-NEGOTIABLE for web/mobile)
+
+Interaction tests (Phase 3) are the PRIMARY verification that features work. Screenshots are SECONDARY but MANDATORY — they verify visual quality and catch layout/styling issues that interaction tests cannot detect. A feature without both interaction tests and screenshots is NOT fully verified.
+
+**Screenshot directory:** `{screenshots_dir}` (provided by parent agent — this is `{pwd}/specs/{scope}/screenshots/`, the scope-specific directory for all visual artifacts).
+
+13. Write or update a Playwright test file that captures screenshots at key states:
+    - Use `page.screenshot({ path: '{screenshots_dir}/feature-{id}-step{N}-{description}.png', fullPage: true })`
+    - Capture BEFORE action, AFTER action, error states, and empty states
+    - Every test MUST have at least one `page.screenshot()` call
+
+14. Run the Playwright tests:
+    ```bash
+    npx playwright test
+    ```
+
+15. Verify screenshots were generated:
+    ```bash
+    ls {screenshots_dir}/feature-{id}-*.png
+    ```
+    If no screenshots exist, the verification has FAILED. Fix and re-run.
+
+16. Use the Read tool to open and visually review EVERY screenshot. Check:
+    - Layout: content fits, no overflow/clipping, proper alignment
+    - Spacing: consistent padding/margins (4/8/16/24/32px scale)
+    - Visual hierarchy: important actions obvious, proper text size hierarchy
+    - States: loading skeleton/spinner, empty state (icon + message + CTA), error state
+    - Aesthetics: polished and intentional, cohesive colors, proper shadows/depth
+    - Data display: real data shown, numbers right-aligned in tables, status badges colored
+
+17. If screenshots reveal problems, fix the UI and re-capture until quality is acceptable.
+
+**Screenshot naming convention:** `feature-{id}-step{N}-{description}.png`
+Examples: `feature-9-step1-product-list.png`, `feature-9-step2-empty-state.png`
+{END IF}
+
+### Phase 4: Gitignore Review
+Follow {skill_base_dir}/references/core/gitignore-standards.md:
+18. Run `git status --short` and check every file against gitignore patterns
+19. Add any missing patterns to `.gitignore`, remove from tracking if needed
+
+### Phase 5: Commit
+20. Update feature_list.json — change "passes": false to "passes": true
+21. Update progress.txt with what was done and current feature pass count
+22. Commit all changes:
+    git add -A && git commit -m "feat: [description] — Implemented feature #[id]: [description]"
+
+## Key Rules
+- Follow existing code patterns and the standards documents
+- Keep changes focused on this feature only
+- Do not break other features
+- Make all decisions yourself, never ask for human input
+- EVERY feature must be verified per the verification strategy — no exceptions
+- BEFORE committing, review ALL files for .gitignore candidates
+{IF type == "web" or type == "mobile":}
+- SCREENSHOTS ARE NON-NEGOTIABLE — do not skip or defer them
+- If the app/server is not running for screenshots, start it (check init.sh or start manually)
+{END IF}
+{IF feature connects frontend to real backend API (replaces mocks, changes fetch config):}
+### Full-Stack Integration Verification (NON-NEGOTIABLE)
+This feature connects the frontend to a real backend. You MUST verify the connection works end-to-end:
+1. **Start both servers** — backend with a real database, frontend with VITE_API_BASE_URL pointing to backend
+2. **Verify route prefix** — `curl` the backend API at the URL the frontend will use (e.g., `/api/v1/...`). If 404, the route prefix is wrong. Code generators often omit the OpenAPI `servers.url` prefix — mount the handler under the correct prefix.
+3. **Verify CORS** — `curl -I -X OPTIONS` with an `Origin` header matching the frontend port. If no `Access-Control-Allow-Origin` header, add CORS middleware. This is the #1 reason frontends silently fail to load data.
+4. **Seed data and screenshot** — Seed 2-3 records, take Playwright screenshots of all pages, and verify they show REAL DATA (not loading skeletons or empty states).
+5. **Check browser console** — Run Playwright with console error capture. Any CORS or fetch errors mean the integration is broken.
+Do NOT mark this feature as passing based only on `tsc --noEmit`. TypeScript cannot catch CORS or route mismatches.
+{END IF}
diff --git a/references/templates/refinement-subagent.md b/references/templates/refinement-subagent.md
new file mode 100644
index 0000000..817045d
--- /dev/null
+++ b/references/templates/refinement-subagent.md
@@ -0,0 +1,107 @@
+# Refinement Subagent Prompt Template
+
+Fill in `{variables}` and evaluate `{IF}` blocks before passing to the Agent tool.
+
+---
+
+You are refining a recently completed feature. The feature is already implemented, tested, verified, and committed. Your job is to polish and improve it — both the user experience and the code quality.
+
+## Project Context
+- Working directory: {pwd}
+- Active scope: {scope}
+- Project type: {type}
+- Feature just completed: #{id} — {description}
+- Screenshots directory: {screenshots_dir}
+- Refinement output: {pwd}/specs/{scope}/refinements/feature-{id}-refinement.md
+
+## Standards Documents
+Read these before starting:
+- {skill_base_dir}/references/core/code-quality.md
+{IF type == "web" or type == "mobile":}
+- {skill_base_dir}/references/web/ux-standards.md
+- {skill_base_dir}/references/web/frontend-design.md
+{END IF}
+
+## What Was Done
+Review the most recent commit to understand what was implemented:
+git log --oneline -1
+git diff HEAD~1 --name-only
+
+{IF type == "web" or type == "mobile":}
+## Part 1: UX/Visual Refinement
+
+Think divergently about how to make users LOVE this interface. Don't just check for bugs — imagine better ways to present the information and interactions.
+
+1. Use the Read tool to review ALL screenshots in {screenshots_dir}/ for this feature
+2. For each screen, evaluate from a first-time user's perspective:
+   - Is the purpose of this screen immediately obvious?
+   - Can the user figure out what to do without instructions?
+   - Does the visual hierarchy guide the eye to the most important action?
+   - Are transitions and state changes smooth and predictable?
+3. Think divergently about improvements — consider alternatives you haven't tried:
+   - Could the layout be reorganized for better flow or scannability?
+   - Would micro-interactions (hover effects, transitions, focus states) make it feel more responsive and alive?
+   - Is whitespace being used effectively to create breathing room and group related elements?
+   - Could typography be more expressive — size contrasts, weight variations, line heights?
+   - Are colors creating the right emotional tone? Could accent colors highlight key actions better?
+   - Are empty states, loading states, and error states not just functional but helpful and encouraging?
+   - Could icons, illustrations, or subtle visual cues improve comprehension?
+4. Research: look at how the standards documents suggest handling similar UI patterns. Are there recommendations you missed?
+5. Implement the most impactful improvements — prioritize changes that make the biggest difference to user understanding and delight
+6. Re-run Playwright tests and re-capture screenshots
+7. Visually verify the improvements look better than before
+{END IF}
+
+## Part 2: Code Quality Refinement
+
+Re-read all generated code with fresh eyes, looking for opportunities to make it more maintainable and testable.
+
+1. Read ALL files changed in the most recent commit: `git diff HEAD~1 --name-only`
+2. For each file, evaluate:
+   - **Abstraction**: Are there functions doing too many things? Should logic be extracted?
+   - **Testability**: Is business logic separated from framework/UI code? Could someone write a unit test for the core logic without setting up the whole framework?
+   - **Readability**: Would a new developer understand this code without extensive context? Are names clear and descriptive?
+   - **Duplication**: Is there repeated logic that should be a shared utility?
+   - **Simplicity**: Are there overly complex control flows that could be simplified? Deep nesting that could be flattened?
+3. Make concrete improvements — refactor, rename, extract, simplify
+4. Run all unit tests — ensure they still pass
+5. If you extracted new logic, write unit tests for it
+
+## Part 3: Write Refinement Report
+
+Write your analysis to `{pwd}/specs/{scope}/refinements/feature-{id}-refinement.md` with this structure:
+
+```markdown
+# Feature #{id} Refinement: {description}
+
+## UX Analysis (web/mobile only)
+- **Screenshots reviewed**: [list of screenshots]
+- **Issues found**: [what problems or opportunities were identified]
+- **Alternatives considered**: [what other approaches were thought about]
+- **Changes made**: [what was actually improved and why]
+- **Changes deferred**: [ideas noted for future consideration, if any]
+
+## Code Quality Analysis
+- **Files reviewed**: [list of files]
+- **Issues found**: [code smells, abstraction opportunities, naming issues]
+- **Refactoring done**: [what was changed and why]
+- **Test coverage**: [new tests added, if any]
+
+## Summary
+[1-2 sentence summary of the refinement pass]
+```
+
+## Commit
+If you made code or UI changes:
+git add -A && git commit -m "refine: polish feature #{id} — [summary of improvements]"
+
+If no code changes were warranted, still commit the refinement report:
+git add specs/{scope}/refinements/ && git commit -m "refine: review feature #{id} — no changes needed"
+
+## Rules
+- This is a POLISH pass — do NOT add new functionality
+- Do NOT break existing tests
+- Keep changes focused on improving what exists
+- Think creatively about UX — the goal is to make users enjoy and understand the interface
+- Think critically about code — the goal is to make the codebase a pleasure to maintain
+- ALWAYS write the refinement report, even if no changes are made

From 9b67924750a74702dc3e63a4cc774b14c56075ca Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Thu, 19 Mar 2026 12:45:22 +0800
Subject: [PATCH 14/17] feat: add verification tests for skill output
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- tests/verify.sh — checks artifacts after skill runs:
  - All features pass
  - feat: commits exist for each feature
  - refine: commits exist for each feature (catches the skipped-refinement bug)
  - Refinement reports exist in specs/{scope}/refinements/
  - Screenshots exist (web/mobile)
  - Commit pattern: no consecutive feat: without refine: between them
- tests/smoke-test.md — minimal 2-feature test scope for quick validation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 tests/smoke-test.md | 136 ++++++++++++++++++++++++++++++++++++++++++++
 tests/verify.sh     |  76 +++++++++++++++++++++++++
 2 files changed, 212 insertions(+)
 create mode 100644 tests/smoke-test.md
 create mode 100755 tests/verify.sh

diff --git a/tests/smoke-test.md b/tests/smoke-test.md
new file mode 100644
index 0000000..c7840e2
--- /dev/null
+++ b/tests/smoke-test.md
@@ -0,0 +1,136 @@
+# Skill Smoke Test
+
+## Purpose
+Verify the iterative-dev skill executes all mandatory steps: implement → verify → refine → next.
+
+## Setup
+Create a minimal test project with 2 trivial features. Run the skill. Check artifacts.
+
+### 1. Create test project
+```bash
+mkdir -p /tmp/iterative-dev-test && cd /tmp/iterative-dev-test
+git init
+```
+
+### 2. Create minimal scope
+```bash
+mkdir -p specs/test/{screenshots,refinements}
+echo "test" > .active-scope
+```
+
+Create `specs/test/spec.md`:
+```
+# Test Project
+A simple CLI tool that greets users.
+```
+
+Create `specs/test/feature_list.json`:
+```json
+{
+  "type": "cli",
+  "features": [
+    {
+      "id": 1,
+      "category": "infrastructure",
+      "priority": "high",
+      "description": "Project scaffolding: create a Node.js project with a greet.js script",
+      "steps": [
+        "Create package.json with name 'greeter'",
+        "Create greet.js that prints 'Hello, World!'",
+        "Verify: node greet.js outputs 'Hello, World!'"
+      ],
+      "passes": false
+    },
+    {
+      "id": 2,
+      "category": "functional",
+      "priority": "high",
+      "description": "User can greet by name: node greet.js Alice prints 'Hello, Alice!'",
+      "steps": [
+        "Modify greet.js to accept a name argument",
+        "Default to 'World' if no name provided",
+        "Write test: node greet.js Alice outputs 'Hello, Alice!'",
+        "Write test: node greet.js (no args) outputs 'Hello, World!'"
+      ],
+      "passes": false
+    }
+  ]
+}
+```
+
+Create symlinks and init:
+```bash
+ln -sf specs/test/spec.md spec.md
+ln -sf specs/test/feature_list.json feature_list.json
+echo "# Progress" > specs/test/progress.txt
+ln -sf specs/test/progress.txt progress.txt
+echo '#!/bin/bash' > init.sh && chmod +x init.sh
+git add -A && git commit -m "init: smoke test scope"
+```
+
+### 3. Run the skill
+```
+/iterative-dev continue
+```
+
+### 4. Verify (automated checks)
+
+```bash
+#!/bin/bash
+# Run this after the skill completes
+
+PASS=0
+FAIL=0
+
+check() {
+  if eval "$2"; then
+    echo "PASS: $1"
+    ((PASS++))
+  else
+    echo "FAIL: $1"
+    ((FAIL++))
+  fi
+}
+
+# All features pass
+check "All features pass" \
+  '[ $(cat feature_list.json | grep -c "\"passes\": true") -eq 2 ]'
+
+# Implementation commits exist
+check "Feature commits exist" \
+  '[ $(git log --oneline | grep -c "feat:") -ge 2 ]'
+
+# Refinement commits exist (THE KEY TEST)
+check "Refinement commits exist" \
+  '[ $(git log --oneline | grep -c "refine:") -ge 2 ]'
+
+# Refinement reports exist
+check "Refinement reports exist" \
+  '[ $(ls specs/test/refinements/feature-*-refinement.md 2>/dev/null | wc -l) -ge 2 ]'
+
+# Commit order: each feat is followed by a refine
+check "Commit order: refine follows feat" \
+  'git log --oneline --reverse | grep -E "feat:|refine:" | \
+   awk "/feat:/{f=1;next} /refine:/{if(f)f=0; else exit 1} END{exit f}"'
+
+# Progress file updated
+check "Progress file updated" \
+  '[ $(wc -l < specs/test/progress.txt) -gt 1 ]'
+
+echo ""
+echo "Results: $PASS passed, $FAIL failed"
+[ $FAIL -eq 0 ] && echo "ALL CHECKS PASSED" || echo "SOME CHECKS FAILED"
+```
+
+## Expected Results
+- 2 `feat:` commits (one per feature)
+- 2 `refine:` commits (one per feature)
+- 2 refinement reports in `specs/test/refinements/`
+- Alternating pattern: feat → refine → feat → refine
+- Both features `"passes": true`
+
+## What This Catches
+- Skipped refinements (the bug that prompted this test)
+- Missing refinement reports
+- Wrong commit order (refinement must follow its feature)
+- Incomplete feature list updates
diff --git a/tests/verify.sh b/tests/verify.sh
new file mode 100755
index 0000000..184af23
--- /dev/null
+++ b/tests/verify.sh
@@ -0,0 +1,76 @@
+#!/bin/bash
+# Verify iterative-dev skill produced expected artifacts
+# Run this in the project directory AFTER the skill completes
+
+SCOPE=$(cat .active-scope 2>/dev/null || echo "unknown")
+TYPE=$(cat feature_list.json | grep -o '"type": *"[^"]*"' | head -1 | grep -o '"[^"]*"$' | tr -d '"')
+TOTAL_FEATURES=$(cat feature_list.json | grep -c '"id":' || echo 0)
+PASSING_FEATURES=$(cat feature_list.json | grep -c '"passes": true' || echo 0)
+FEAT_COMMITS=$(git log --oneline | grep -c "feat:" || true)
+REFINE_COMMITS=$(git log --oneline | grep -c "refine:" || true)
+REFINEMENT_REPORTS=$(ls specs/$SCOPE/refinements/feature-*-refinement.md 2>/dev/null | wc -l | tr -d ' ')
+SCREENSHOTS=$(ls specs/$SCOPE/screenshots/feature-*.png 2>/dev/null | wc -l | tr -d ' ')
+
+PASS=0
+FAIL=0
+
+check() {
+  if eval "$2"; then
+    echo "  PASS: $1"
+    ((PASS++))
+  else
+    echo "  FAIL: $1"
+    ((FAIL++))
+  fi
+}
+
+echo "=== Iterative-Dev Skill Verification ==="
+echo "Scope: $SCOPE | Type: $TYPE | Features: $TOTAL_FEATURES"
+echo ""
+
+echo "--- Feature Completion ---"
+check "All features pass ($PASSING_FEATURES/$TOTAL_FEATURES)" \
+  "[ $PASSING_FEATURES -eq $TOTAL_FEATURES ]"
+
+echo ""
+echo "--- Implementation ---"
+check "Feature commits exist ($FEAT_COMMITS)" \
+  "[ $FEAT_COMMITS -ge $TOTAL_FEATURES ]"
+
+echo ""
+echo "--- Refinement (the critical gate) ---"
+check "Refinement commits exist ($REFINE_COMMITS)" \
+  "[ $REFINE_COMMITS -ge $TOTAL_FEATURES ]"
+check "Refinement reports exist ($REFINEMENT_REPORTS)" \
+  "[ $REFINEMENT_REPORTS -ge $TOTAL_FEATURES ]"
+check "Refinements match features (commits: $REFINE_COMMITS >= features: $TOTAL_FEATURES)" \
+  "[ $REFINE_COMMITS -ge $TOTAL_FEATURES ]"
+
+if [ "$TYPE" = "web" ] || [ "$TYPE" = "mobile" ]; then
+  echo ""
+  echo "--- Screenshots (web/mobile) ---"
+  # Count UI features (exclude infrastructure)
+  UI_FEATURES=$(cat feature_list.json | grep -c '"full-stack"\|"functional"\|"style"' || true)
+  check "Screenshots captured ($SCREENSHOTS for ~$UI_FEATURES UI features)" \
+    "[ $SCREENSHOTS -gt 0 ]"
+fi
+
+echo ""
+echo "--- Commit Pattern ---"
+# Verify feat/refine alternation
+PATTERN_OK=true
+LAST=""
+while IFS= read -r line; do
+  TYPE_TAG=$(echo "$line" | grep -o "feat:\|refine:" || true)
+  if [ "$TYPE_TAG" = "feat:" ] && [ "$LAST" = "feat:" ]; then
+    PATTERN_OK=false  # Two feats in a row = missing refinement
+  fi
+  [ -n "$TYPE_TAG" ] && LAST=$TYPE_TAG
+done < <(git log --oneline --reverse)
+check "No consecutive feat: commits (refinement between each)" \
+  "$PATTERN_OK"
+
+echo ""
+echo "=== Results: $PASS passed, $FAIL failed ==="
+[ $FAIL -eq 0 ] && echo "ALL CHECKS PASSED" || echo "SOME CHECKS FAILED"
+exit $FAIL

From 839fcca1ecdefda862d131631cf6097be6448a9f Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Thu, 19 Mar 2026 21:08:05 +0800
Subject: [PATCH 15/17] =?UTF-8?q?fix:=20never=20delete=20screenshots=20?=
 =?UTF-8?q?=E2=80=94=20they=20are=20committed=20results,=20not=20ephemeral?=
 =?UTF-8?q?=20artifacts?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Screenshots are project deliverables that should be preserved in the
repo as evidence of feature verification. Removed all rm -rf commands
targeting screenshot directories from init.sh templates and the e2e
verification workflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 references/core/init-script-template.md | 15 ++++++---------
 references/web/e2e-verification.md      |  7 ++++---
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/references/core/init-script-template.md b/references/core/init-script-template.md
index 1e2cf83..f7de7ad 100644
--- a/references/core/init-script-template.md
+++ b/references/core/init-script-template.md
@@ -27,15 +27,13 @@ pkill -f 'vite' 2>/dev/null || true
 pkill -f 'node.*dev' 2>/dev/null || true
 sleep 1
 
-# 2. Delete old screenshots for fresh test results
-# Screenshots are stored per-scope in specs/{scope}/screenshots/
-echo "Cleaning old test artifacts..."
+# 2. Ensure screenshot and refinement directories exist
+# Screenshots are committed to the repo as results — never delete them
 SCOPE=$(cat .active-scope 2>/dev/null || echo "default")
 SCREENSHOT_DIR="specs/$SCOPE/screenshots"
-rm -rf "$SCREENSHOT_DIR"/*.png 2>/dev/null || true
-rm -rf test-results 2>/dev/null || true
 mkdir -p "$SCREENSHOT_DIR"
 mkdir -p "specs/$SCOPE/refinements"
+rm -rf test-results 2>/dev/null || true
 
 # 3. Install/update dependencies
 echo "Installing dependencies..."
@@ -218,12 +216,11 @@ echo "=== Mobile Development Environment ==="
 # 1. Kill existing processes
 pkill -f 'metro\|react-native' 2>/dev/null || true
 
-# 2. Clean old artifacts
-# Screenshots are stored per-scope in specs/{scope}/screenshots/
+# 2. Ensure screenshot and refinement directories exist
+# Screenshots are committed to the repo as results — never delete them
 SCOPE=$(cat .active-scope 2>/dev/null || echo "default")
-rm -rf "specs/$SCOPE/screenshots"/*.png 2>/dev/null || true
-rm -rf test-results/ 2>/dev/null || true
 mkdir -p "specs/$SCOPE/screenshots" "specs/$SCOPE/refinements"
+rm -rf test-results/ 2>/dev/null || true
 
 # 3. Install dependencies
 npm install
diff --git a/references/web/e2e-verification.md b/references/web/e2e-verification.md
index cb9059b..491ab0b 100644
--- a/references/web/e2e-verification.md
+++ b/references/web/e2e-verification.md
@@ -32,11 +32,12 @@ lsof -i :8082 | head -2  # Backend
 
 If not running, start them with `bash init.sh`.
 
-### Step 2: Clear Old Screenshots
+### Step 2: Ensure Screenshot Directory Exists
 
 ```bash
-# Clear screenshots from the scope's screenshot directory
-rm -rf specs/{scope}/screenshots/*.png 2>/dev/null || true
+# Screenshots are committed to the repo as results — never delete them
+# New screenshots will overwrite same-named files; old ones are preserved as history
+mkdir -p specs/{scope}/screenshots
 rm -rf test-results/**/*.png 2>/dev/null || true
 ```
 

From 736b0422e7eb87c4e4a5277c805c4faa267c4b89 Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Fri, 20 Mar 2026 16:29:39 +0800
Subject: [PATCH 16/17] fix: refinement reports use timestamped filenames
 instead of overwriting

Each refinement pass now creates a new file (e.g.
feature-2-refinement-20260320-143052.md) instead of overwriting the
previous report. This preserves the full history of reviews across
multiple refinement passes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 SKILL.md                                    | 4 ++--
 references/templates/refinement-subagent.md | 9 +++++++--
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/SKILL.md b/SKILL.md
index aee653c..9953719 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -130,8 +130,8 @@ Read `references/templates/refinement-subagent.md` for the full prompt template.
 
 **Refinement gate** (parent must verify after subagent completes):
 ```bash
-# Report must exist
-ls specs/{scope}/refinements/feature-{id}-refinement.md
+# At least one refinement report must exist for this feature (each pass creates a new timestamped file)
+ls specs/{scope}/refinements/feature-{id}-refinement-*.md | head -1
 # Commit must exist
 git log --oneline -1 | grep "refine:"
 ```
diff --git a/references/templates/refinement-subagent.md b/references/templates/refinement-subagent.md
index 817045d..cb04870 100644
--- a/references/templates/refinement-subagent.md
+++ b/references/templates/refinement-subagent.md
@@ -12,7 +12,7 @@ You are refining a recently completed feature. The feature is already implemente
 - Project type: {type}
 - Feature just completed: #{id} — {description}
 - Screenshots directory: {screenshots_dir}
-- Refinement output: {pwd}/specs/{scope}/refinements/feature-{id}-refinement.md
+- Refinement output: {pwd}/specs/{scope}/refinements/feature-{id}-refinement-{YYYYMMDD-HHMMSS}.md (use current timestamp)
 
 ## Standards Documents
 Read these before starting:
@@ -69,7 +69,9 @@ Re-read all generated code with fresh eyes, looking for opportunities to make it
 
 ## Part 3: Write Refinement Report
 
-Write your analysis to `{pwd}/specs/{scope}/refinements/feature-{id}-refinement.md` with this structure:
+Each refinement pass creates a NEW file with a timestamp — never overwrite previous reports. This preserves the history of what was reviewed and changed across multiple refinement passes.
+
+Write your analysis to `{pwd}/specs/{scope}/refinements/feature-{id}-refinement-{YYYYMMDD-HHMMSS}.md` (replace `{YYYYMMDD-HHMMSS}` with the current date-time, e.g. `feature-2-refinement-20260320-143052.md`) with this structure:
 
 ```markdown
 # Feature #{id} Refinement: {description}
@@ -92,6 +94,9 @@ Write your analysis to `{pwd}/specs/{scope}/refinements/feature-{id}-refinement.
 ```
 
 ## Commit
+
+Generate the timestamp for the refinement file name using: `date +%Y%m%d-%H%M%S`
+
 If you made code or UI changes:
 git add -A && git commit -m "refine: polish feature #{id} — [summary of improvements]"
 

From 87120cbe18c555548f9fad4431b087fd346f3a26 Mon Sep 17 00:00:00 2001
From: Felix Sun <sunfmin@gmail.com>
Date: Sat, 21 Mar 2026 11:28:36 +0800
Subject: [PATCH 17/17] fix: address friction patterns found by skill-doctor

Analyzed 2 sessions (77 files), found 329 friction events.
Key fixes:
- Add "Subagent Anti-Patterns" table: retry limits, prohibited tools, compile-before-test
- Add compilation gate (tsc/go build) to Step 2: VERIFY before other gates
- Expand Decision Making table with retry, Edit, and test failure strategies
- Add anti-retry and compile-before-test rules to feature-subagent template
- Add compilation gate phase (2b) before test phase in feature-subagent
- Add autonomy and anti-retry rules to refinement-subagent template
- Add 7 regression tests in evals/friction-regression.json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 SKILL.md                                    | 36 +++++++++++++--
 evals/friction-regression.json              | 51 +++++++++++++++++++++
 references/templates/feature-subagent.md    | 20 +++++++-
 references/templates/refinement-subagent.md |  4 ++
 4 files changed, 105 insertions(+), 6 deletions(-)
 create mode 100644 evals/friction-regression.json

diff --git a/SKILL.md b/SKILL.md
index 9953719..8d17e72 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -30,6 +30,19 @@ FOR each feature (highest priority first):
 7. Refinement is non-negotiable — every feature polished for delight, not just function
 8. Standards are auditable — quality lives in reference docs, verified systematically
 
+## Subagent Anti-Patterns (MUST AVOID)
+
+These patterns were found in real sessions and waste significant time:
+
+| Anti-Pattern | Rule |
+|-------------|------|
+| **Retry loops** | If the same tool call fails twice with the same approach, STOP and change strategy. Read error output carefully — don't blindly retry. |
+| **Edit without Read** | If the Edit tool fails (old_string not found), ALWAYS Read the file first to see current content before retrying. Never guess at file contents. |
+| **AskUserQuestion** | NEVER use the `AskUserQuestion` tool. The human may be asleep. Make your best judgment and move on. |
+| **EnterPlanMode / ExitPlanMode** | NEVER enter or exit plan mode during autonomous execution. Just execute directly. |
+| **Blind test reruns** | When a test fails, read the FULL error output, identify the root cause, fix it, THEN rerun. Rerunning without changes is a waste. |
+| **Compile-then-pray** | Always run compilation checks (`tsc --noEmit`, `go build ./...`) BEFORE running tests. Fix compile errors first — they cause cascading test failures. |
+
 ## Project Types
 
 | Type | Verification | Extra Standards |
@@ -107,9 +120,19 @@ Read `references/templates/feature-subagent.md` for the full prompt template. La
 
 After the implementation subagent completes:
 
-a. **Commit gate**: `git log --oneline -1` — confirm `feat:` commit exists
-b. **Feature list gate**: confirm `"passes": true` in feature_list.json
-c. **Type-specific gate**:
+a. **Compile gate** (run BEFORE other gates — catches most subagent mistakes):
+
+| Type | Command |
+|------|---------|
+| web (frontend) | `cd frontend && npx tsc --noEmit` |
+| api / library / cli (Go) | `go build ./...` |
+| api / library / cli (other) | language-appropriate compile/lint check |
+
+If compilation fails, launch a fix subagent immediately — do not proceed to other gates.
+
+b. **Commit gate**: `git log --oneline -1` — confirm `feat:` commit exists
+c. **Feature list gate**: confirm `"passes": true` in feature_list.json
+d. **Type-specific gate**:
 
 | Type | Gate |
 |------|------|
@@ -120,7 +143,7 @@ c. **Type-specific gate**:
 | library | All tests pass including race detection |
 | data | Transformation tests cover edge cases |
 
-d. If any gate fails, launch a fix subagent before proceeding.
+e. If any gate fails, launch a fix subagent before proceeding. Include the FULL error output in the subagent prompt so it can fix the root cause directly.
 
 #### Step 3: REFINE (mandatory — not optional)
 
@@ -168,6 +191,11 @@ Before ending: final standards audit, run all tests, verify `references/core/ses
 | Test is flaky | Fix with proper waits, don't skip |
 | Feature too large | Break into sub-tasks within subagent |
 | Build/dependency error | Read error, fix, rebuild |
+| Same tool fails twice | STOP retrying same approach. Read error output. Try a different strategy. |
+| Edit tool: old_string not found | Read the file first, get exact current content, then retry Edit |
+| Test fails after re-run | Read failure output, fix root cause in code, then re-run. Never re-run without a code change. |
+| TypeScript error after edit | Run `tsc --noEmit` to see all errors, fix them ALL, then re-run tests |
+| Tempted to use AskUserQuestion | NEVER — make your best judgment, the human may be asleep |
 | Port conflict | Kill conflicting process, restart |
 | Feature blocked | Skip to next, come back later |
 | Tempted to skip refinement | NEVER skip — launch it |
diff --git a/evals/friction-regression.json b/evals/friction-regression.json
new file mode 100644
index 0000000..75303f7
--- /dev/null
+++ b/evals/friction-regression.json
@@ -0,0 +1,51 @@
+[
+  {
+    "test_id": "no-ask-user-question",
+    "prompt": "Implement feature #3 for a web project. The spec is ambiguous about whether to use tabs or a dropdown for navigation.",
+    "expected_behavior": "Agent makes a judgment call (e.g., uses tabs) and proceeds without asking the user",
+    "anti_pattern": "Agent uses AskUserQuestion to ask the user which approach to take",
+    "assertion": "Agent must NOT call AskUserQuestion tool. Must pick an approach autonomously."
+  },
+  {
+    "test_id": "no-plan-mode",
+    "prompt": "Continue working on the iterative-dev scope. 3 features remain.",
+    "expected_behavior": "Agent reads progress, runs init.sh, and starts implementing the next feature directly",
+    "anti_pattern": "Agent enters plan mode (EnterPlanMode) before starting work",
+    "assertion": "Agent must NOT call EnterPlanMode or ExitPlanMode. Must execute directly."
+  },
+  {
+    "test_id": "compile-before-test",
+    "prompt": "Implement a new React component for a web project with a Go backend.",
+    "expected_behavior": "Agent runs tsc --noEmit (frontend) and go build ./... (backend) before running Playwright or go test",
+    "anti_pattern": "Agent runs Playwright tests or go test without first checking compilation",
+    "assertion": "tsc --noEmit or go build must appear in tool calls BEFORE any test execution command"
+  },
+  {
+    "test_id": "read-before-failed-edit",
+    "prompt": "Edit a file where the content has changed since last read (Edit fails with old_string not found).",
+    "expected_behavior": "Agent reads the file to get current content, then retries Edit with correct old_string",
+    "anti_pattern": "Agent retries Edit with slightly modified old_string without reading the file first",
+    "assertion": "After a failed Edit, the next relevant tool call must be Read on the same file"
+  },
+  {
+    "test_id": "no-blind-test-rerun",
+    "prompt": "A Playwright test fails with 'element not found: [data-testid=save-btn]'. Fix it.",
+    "expected_behavior": "Agent reads test error, identifies missing testid, adds it to the component, then reruns",
+    "anti_pattern": "Agent reruns the same Playwright test without making any code changes",
+    "assertion": "Between consecutive test runs, at least one Edit or Write call must appear"
+  },
+  {
+    "test_id": "max-two-retries",
+    "prompt": "Implement a feature where the initial approach hits a library limitation.",
+    "expected_behavior": "After 2 failures with the same approach, agent changes strategy entirely",
+    "anti_pattern": "Agent retries the same failing approach 3+ times with minor variations",
+    "assertion": "No tool should be called with substantially similar input more than 3 times consecutively"
+  },
+  {
+    "test_id": "error-output-in-fix-subagent",
+    "prompt": "Parent agent detects that a feature's Playwright test failed. Launch a fix subagent.",
+    "expected_behavior": "Fix subagent prompt includes the full error output from the failed test",
+    "anti_pattern": "Fix subagent prompt says 'tests failed' without including the actual error",
+    "assertion": "Agent tool prompt for fix subagent must contain the test failure output text"
+  }
+]
diff --git a/references/templates/feature-subagent.md b/references/templates/feature-subagent.md
index 1cfc204..ec51b3b 100644
--- a/references/templates/feature-subagent.md
+++ b/references/templates/feature-subagent.md
@@ -45,10 +45,20 @@ Follow {skill_base_dir}/references/core/code-quality.md:
 8. Eliminate duplication — reuse existing helpers or extract new shared ones
 9. Write unit tests for all extracted logic. Run them until green.
 
+### Phase 2b: Compilation Gate (BEFORE tests)
+Run compilation checks and fix ALL errors before proceeding to tests:
+{IF type == "web" or type == "mobile":}
+- `cd frontend && npx tsc --noEmit` — fix every TypeScript error (unused imports, type mismatches)
+{END IF}
+{IF type == "api" or type == "library" or type == "cli":}
+- `go build ./...` (or equivalent) — fix every build error
+{END IF}
+Do NOT skip to tests — compile errors cause cascading failures that waste time debugging the wrong thing.
+
 ### Phase 3: Verification
 Follow {skill_base_dir}/references/verification/{type}-verification.md:
 10. Execute the verification strategy defined for {type} projects
-11. Run all relevant tests — fix until green
+11. Run all relevant tests — fix until green. If a test fails, READ the full error output, identify the root cause, fix the code, THEN re-run. Never re-run a test without making a change.
 12. MANDATORY: Perform the verification checks specified in the doc
     Fix and re-run until all pass.
 
@@ -104,9 +114,15 @@ Follow {skill_base_dir}/references/core/gitignore-standards.md:
 - Follow existing code patterns and the standards documents
 - Keep changes focused on this feature only
 - Do not break other features
-- Make all decisions yourself, never ask for human input
+- Make all decisions yourself, never ask for human input — NEVER use AskUserQuestion or EnterPlanMode
 - EVERY feature must be verified per the verification strategy — no exceptions
 - BEFORE committing, review ALL files for .gitignore candidates
+- **Anti-retry discipline**: If a tool call fails twice with the same approach, STOP and change strategy. Read the error output carefully before retrying anything.
+- **Read before Edit**: If the Edit tool fails (old_string not found), always Read the file first to get current content. Never guess at file contents.
+- **Compile before test**: Run compilation checks BEFORE running tests:
+  - Frontend: `npx tsc --noEmit` — fix ALL type errors before running Playwright
+  - Go backend: `go build ./...` — fix ALL build errors before running `go test`
+  - Fix compile errors FIRST — they cause cascading test failures that waste time
 {IF type == "web" or type == "mobile":}
 - SCREENSHOTS ARE NON-NEGOTIABLE — do not skip or defer them
 - If the app/server is not running for screenshots, start it (check init.sh or start manually)
diff --git a/references/templates/refinement-subagent.md b/references/templates/refinement-subagent.md
index cb04870..39058d5 100644
--- a/references/templates/refinement-subagent.md
+++ b/references/templates/refinement-subagent.md
@@ -110,3 +110,7 @@ git add specs/{scope}/refinements/ && git commit -m "refine: review feature #{id
 - Think creatively about UX — the goal is to make users enjoy and understand the interface
 - Think critically about code — the goal is to make the codebase a pleasure to maintain
 - ALWAYS write the refinement report, even if no changes are made
+- NEVER use AskUserQuestion or EnterPlanMode — work autonomously
+- **Read before Edit**: If Edit fails (old_string not found), Read the file first. Never guess.
+- **Compile before test**: After any code change, run `tsc --noEmit` (frontend) or `go build ./...` (backend) BEFORE running tests. Fix compile errors first.
+- **Max 2 retries**: If the same approach fails twice, change strategy. Read errors carefully.