diff --git a/LICENSE b/LICENSE index 276ed3b..9f91ba9 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ MIT License -Copyright (c) 2026 The Plant +Copyright (c) 2025 The Plant Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/README.md b/README.md new file mode 100644 index 0000000..edd2286 --- /dev/null +++ b/README.md @@ -0,0 +1,208 @@ +# iterative-dev + +An AI skill for iterative development with AI agents. Works with **any project type** — web apps, APIs, CLI tools, libraries, data pipelines, and mobile apps. Supports **Claude Code** (with subagents) and **Windsurf**. + +## Installation + +```bash +npx skills add https://github.com/theplant/iterative-dev +``` + +## Overview + +This skill provides a complete workflow for AI agents working on long-running development projects across multiple sessions. It ensures **incremental, reliable progress** with proper handoffs between sessions. + +### Supported Project Types + +| Type | Verification Strategy | +|------|----------------------| +| **web** | Playwright E2E tests + screenshot visual review | +| **api** | Integration tests + endpoint/response validation | +| **cli** | Command execution tests + output/exit code validation | +| **library** | Unit tests + public API surface validation | +| **data** | Transformation tests + schema/data quality checks | +| **mobile** | Mobile E2E tests (Detox/XCTest/Flutter) + screenshot review | + +### Claude Code Features + +- **Subagent per feature** — Each feature is implemented in its own subagent using the Agent tool, keeping context clean and isolated +- **Autonomous loop** — The agent keeps working through ALL features without stopping, even if the human is away +- **Self-directed decisions** — Handles ambiguity, errors, and blockers autonomously using decision-making guidelines +- **Commit after each feature** — Every completed feature is committed independently for clean git history +- **Type-aware verification** — Automatically uses the right verification strategy for your project type + +## Core Principles + +1. **Incremental progress** — Work on ONE feature at a time. Finish, test, and commit before moving on. +2. **Feature list is sacred** — `feature_list.json` is the single source of truth. +3. **Git discipline** — Commit after every completed feature. +4. **Clean handoffs** — Every session ends with committed work and updated progress notes. +5. **Test before build** — Verify existing features work before implementing new ones. +6. **Autonomous execution** — Make all decisions yourself. Never stop to ask the human. +7. **Subagent isolation** — Each feature runs in its own subagent for clean context. + +## Workflows + +| Workflow | Use When | +|----------|----------| +| **init-scope** | Starting a new scope, switching scopes, or setting up project structure | +| **continue** | Every session after init — implements ALL remaining features with verification built in | + +## Key Files + +- `spec.md` — Project specification (symlink to active scope) +- `feature_list.json` — Feature tracking with pass/fail status and project type +- `progress.txt` — Session progress log (symlink to active scope) +- `init.sh` — Development environment setup script + +## How It Works (Claude Code) + +1. Agent reads `feature_list.json` to find incomplete features and project type +2. For each feature, launches a **subagent** (via Agent tool) with full context +3. Subagent implements the feature, runs type-appropriate verification, and commits +4. Parent agent confirms completion, then **loops back** to pick the next feature +5. Only stops when ALL features have `"passes": true` + +## How to Use + +### Case 1: Write spec.md yourself, then initialize + +Best when you have a clear vision of what to build. Write the spec first, then let the agent set up the scope and generate the feature list. + +**Step 1 — Write your spec:** + +Create `specs/auth/spec.md` (or any scope name) with your project specification: + +```markdown +# Auth System + +Build a JWT-based authentication system with: +- User registration with email/password +- Login endpoint returning JWT tokens +- Password reset via email +- Role-based access control (admin, user) +- Rate limiting on auth endpoints +``` + +**Step 2 — Initialize the scope:** + +``` +> Initialize scope "auth" using the spec I wrote in specs/auth/spec.md +``` + +The agent will read your spec, detect the project type, generate `feature_list.json`, create `init.sh`, and commit. + +**Step 3 — Continue (every subsequent session):** + +``` +> Continue working +``` + +The agent picks up where it left off and implements all remaining features autonomously. + +--- + +### Case 2: Describe what you want, let the agent generate spec.md + +Best for brainstorming or when you want the agent to help shape the spec. Just describe your idea in the prompt. + +``` +> Initialize a new scope called "dashboard". I want a real-time analytics dashboard +> with charts for user signups, revenue, and API usage. It should have date range +> filters, CSV export, and a dark mode toggle. Use React + Recharts. +``` + +The agent will: +1. Create `specs/dashboard/spec.md` from your description +2. Detect project type (web) +3. Generate `feature_list.json` with prioritized features +4. Create `init.sh` with the right dev environment setup +5. Commit everything + +Then continue in subsequent sessions: + +``` +> Continue working +``` + +--- + +### Case 3: Switch between existing scopes + +When you have multiple scopes and want to switch context: + +``` +> Switch to scope "video-editor" +``` + +The agent updates `.active-scope` and symlinks `spec.md` / `feature_list.json` to the selected scope. + +--- + +### Case 4: Compliance / standards alignment scope + +When your scope is about aligning code with a reference document (not building new features): + +``` +> Initialize a new scope called "standards-alignment" to align our codebase +> with the requirements in AGENTS.md +``` + +The agent uses the **Constitution Audit Workflow** — it systematically extracts every requirement from the reference document, verifies each against your code, and generates features only from verified violations. + +--- + +### Case 5: Continue a multi-session project + +Every session after the first, just say: + +``` +> Continue working +> Pick up where I left off +> Next feature +``` + +The agent reads `feature_list.json` and `progress.txt`, runs regression tests, then implements all remaining features in a loop — committing after each one. It won't stop until everything passes. + +--- + +### Typical workflow timeline + +``` +Session 1: "Initialize scope 'my-app' — here's what I want to build: ..." + → Agent creates spec.md, feature_list.json, init.sh + +Session 2: "Continue working" + → Agent implements features #1–#5, commits each + +Session 3: "Continue working" + → Agent implements features #6–#12, all pass, scope complete +``` + +## Project Structure + +``` +references/ +├── core/ # All project types +│ ├── code-quality.md +│ ├── gitignore-standards.md +│ ├── feature-list-format.md +│ ├── session-handoff-standards.md +│ ├── constitution-audit.md +│ ├── init-script-template.md +│ └── continue-workflow.md +├── web/ # Web and mobile projects +│ ├── ux-standards.md +│ └── frontend-design.md +└── verification/ # One per project type + ├── web-verification.md + ├── api-verification.md + ├── cli-verification.md + ├── library-verification.md + ├── data-verification.md + └── mobile-verification.md +``` + +## License + +MIT diff --git a/SKILL.md b/SKILL.md new file mode 100644 index 0000000..8d17e72 --- /dev/null +++ b/SKILL.md @@ -0,0 +1,228 @@ +--- +name: iterative-dev +description: Manage long-running AI agent development projects with incremental progress, scoped features, and verification. Works with any project type — web, API, CLI, library, data pipeline, mobile. Use this skill when working on multi-session projects, implementing features incrementally, running tests, initializing project scopes, or continuing work from previous sessions. Triggers on phrases like "continue working", "pick up where I left off", "next feature", "run tests", "verify", "initialize scope", "switch scope", "feature list", "incremental progress", or any multi-session development workflow. +--- + +# Iterative Development Workflow + +Autonomous, incremental development with quality gates. One feature at a time. Implement → verify → refine → next. + +## Core Loop + +``` +FOR each feature (highest priority first): + 1. IMPLEMENT — launch subagent to build, test, and commit + 2. VERIFY — parent checks: commit exists, screenshots exist (web), tests prove outcomes + 3. REFINE — launch subagent to polish UX + code quality, write report, commit + 4. NEXT — immediately proceed to next feature +``` + +**All three steps are mandatory. Skipping refinement is as wrong as skipping verification.** + +## Principles + +1. ONE feature at a time — finish, test, commit before moving on +2. `feature_list.json` is the single source of truth — see `references/core/feature-list-format.md` +3. Git commit after every feature and every refinement +4. Autonomous execution — never stop to ask the human, the human may be asleep +5. Subagent per feature — isolation prevents context overflow +6. Verification is non-negotiable — every feature proven working per project type +7. Refinement is non-negotiable — every feature polished for delight, not just function +8. Standards are auditable — quality lives in reference docs, verified systematically + +## Subagent Anti-Patterns (MUST AVOID) + +These patterns were found in real sessions and waste significant time: + +| Anti-Pattern | Rule | +|-------------|------| +| **Retry loops** | If the same tool call fails twice with the same approach, STOP and change strategy. Read error output carefully — don't blindly retry. | +| **Edit without Read** | If the Edit tool fails (old_string not found), ALWAYS Read the file first to see current content before retrying. Never guess at file contents. | +| **AskUserQuestion** | NEVER use the `AskUserQuestion` tool. The human may be asleep. Make your best judgment and move on. | +| **EnterPlanMode / ExitPlanMode** | NEVER enter or exit plan mode during autonomous execution. Just execute directly. | +| **Blind test reruns** | When a test fails, read the FULL error output, identify the root cause, fix it, THEN rerun. Rerunning without changes is a waste. | +| **Compile-then-pray** | Always run compilation checks (`tsc --noEmit`, `go build ./...`) BEFORE running tests. Fix compile errors first — they cause cascading test failures. | + +## Project Types + +| Type | Verification | Extra Standards | +|------|-------------|-----------------| +| **web** | Playwright E2E + screenshots | `web/ux-standards.md`, `web/frontend-design.md` | +| **api** | Integration tests + endpoint validation | — | +| **cli** | Command execution + output validation | — | +| **library** | Unit tests + public API validation | — | +| **data** | Transformation tests + data quality | — | +| **mobile** | Mobile E2E + screenshots | `web/ux-standards.md` (adapted) | + +--- + +## Workflow: Initialize Scope + +### Directory Structure +``` +project-root/ +├── specs/{scope}/ +│ ├── spec.md, feature_list.json, progress.txt +│ ├── screenshots/ +│ └── refinements/ +├── .active-scope +├── spec.md → specs/{scope}/spec.md (symlink) +├── feature_list.json → specs/{scope}/... (symlink) +├── progress.txt → specs/{scope}/... (symlink) +└── init.sh +``` + +### Steps + +1. **Check state**: `ls specs/ && cat .active-scope` +2. **Create scope**: `mkdir -p specs/{scope}/{screenshots,refinements}`, write `spec.md` +3. **Switch**: `echo "{scope}" > .active-scope`, create symlinks +4. **Determine project type**: Browser→web, Terminal→cli, Import→library, HTTP→api, Phone→mobile, Data→data +5. **Create feature list** — two methods: + - **New features**: Follow `references/core/feature-list-format.md` + - **Constitution/standards alignment**: Follow `references/core/constitution-audit.md` + + **Critical rules for features:** + - Outcome-oriented (what user can DO, not what components exist) + - Full-stack vertical slices (backend + frontend together) — see feature-list-format.md + - Self-contained (each feature includes its own tests — no separate "testing" features) + - UI features MUST include screenshot + interaction test steps + - Include `"type"` field in feature_list.json +6. **Create init.sh** — see `references/core/init-script-template.md` +7. **Commit** + +--- + +## Workflow: Continue Session + +### Startup + +```bash +pwd && cat progress.txt && cat feature_list.json && git log --oneline -20 +bash init.sh +``` + +Verify existing features work before implementing new ones. + +### Feature Loop (NON-STOP until all features pass) + +**Never stop to report progress. Never ask the human. Keep going until done.** + +For each incomplete feature (highest priority first): + +#### Step 1: IMPLEMENT + +Read `references/templates/feature-subagent.md` for the full prompt template. Launch via Agent tool. + +**Reference doc paths**: The `references/` directory is in THIS SKILL's install directory, not the project. Resolve to absolute paths using `{skill_base_dir}` shown at top of this prompt. + +#### Step 2: VERIFY (parent agent — mandatory gates) + +After the implementation subagent completes: + +a. **Compile gate** (run BEFORE other gates — catches most subagent mistakes): + +| Type | Command | +|------|---------| +| web (frontend) | `cd frontend && npx tsc --noEmit` | +| api / library / cli (Go) | `go build ./...` | +| api / library / cli (other) | language-appropriate compile/lint check | + +If compilation fails, launch a fix subagent immediately — do not proceed to other gates. + +b. **Commit gate**: `git log --oneline -1` — confirm `feat:` commit exists +c. **Feature list gate**: confirm `"passes": true` in feature_list.json +d. **Type-specific gate**: + +| Type | Gate | +|------|------| +| web/mobile | **Screenshot gate**: `ls specs/{scope}/screenshots/feature-{id}-*.png \| wc -l` — if 0, BLOCK and launch screenshot subagent. If >0, spot-check one with Read tool. **Outcome test gate**: verify tests perform user actions (not just screenshots). | +| web full-stack | **Integration smoke test**: verify backend responds (not 404), verify CORS headers, verify screenshots show real data (not loading spinners). See `references/verification/web-verification.md`. | +| api | Verify integration tests exist and cover error cases | +| cli | Smoke test: `./bin/{tool} --help` | +| library | All tests pass including race detection | +| data | Transformation tests cover edge cases | + +e. If any gate fails, launch a fix subagent before proceeding. Include the FULL error output in the subagent prompt so it can fix the root cause directly. + +#### Step 3: REFINE (mandatory — not optional) + +Read `references/templates/refinement-subagent.md` for the full prompt template. Launch via Agent tool. + +**Why refinement exists**: Implementation subagents build features that *work*. Refinement subagents make features *delightful*. Without refinement, UX issues (spacing, hierarchy, micro-interactions) and code smells (duplication, naming, complexity) ship uncaught. It's the quality difference between "functional" and "users love it". + +**Refinement gate** (parent must verify after subagent completes): +```bash +# At least one refinement report must exist for this feature (each pass creates a new timestamped file) +ls specs/{scope}/refinements/feature-{id}-refinement-*.md | head -1 +# Commit must exist +git log --oneline -1 | grep "refine:" +``` +If either is missing, launch the refinement subagent again. Do NOT proceed without refinement. + +#### Step 4: NEXT + +Loop back immediately to the next incomplete feature. No pausing, no reporting. + +### Periodic Standards Audit + +**When**: Every 5 features AND at session end. + +For each applicable standards doc, launch audit subagent (see `references/templates/audit-subagent.md`). Fix violations before proceeding. + +Applicable standards by type: +- **All**: `core/code-quality.md`, `core/gitignore-standards.md`, `core/session-handoff-standards.md` +- **web/mobile**: also `web/ux-standards.md`, `web/frontend-design.md` + +### Session End + +Only end when ALL features have `"passes": true` and all refinements are committed, or a truly unrecoverable error occurs. + +Before ending: final standards audit, run all tests, verify `references/core/session-handoff-standards.md`. + +--- + +## Decision Making (autonomous — human may be asleep) + +| Situation | Decision | +|-----------|----------| +| Ambiguous spec | Simplest reasonable interpretation | +| Multiple approaches | Match existing patterns | +| Test is flaky | Fix with proper waits, don't skip | +| Feature too large | Break into sub-tasks within subagent | +| Build/dependency error | Read error, fix, rebuild | +| Same tool fails twice | STOP retrying same approach. Read error output. Try a different strategy. | +| Edit tool: old_string not found | Read the file first, get exact current content, then retry Edit | +| Test fails after re-run | Read failure output, fix root cause in code, then re-run. Never re-run without a code change. | +| TypeScript error after edit | Run `tsc --noEmit` to see all errors, fix them ALL, then re-run tests | +| Tempted to use AskUserQuestion | NEVER — make your best judgment, the human may be asleep | +| Port conflict | Kill conflicting process, restart | +| Feature blocked | Skip to next, come back later | +| Tempted to skip refinement | NEVER skip — launch it | +| Web: frontend loads forever | Check CORS + route prefix | +| Web: curl works, browser doesn't | CORS middleware missing | +| Web: backend 404 on /api/v1/ | Mount handler under correct prefix | + +--- + +## Reference Files + +### Templates (subagent prompts) +- `references/templates/feature-subagent.md` — Implementation subagent prompt +- `references/templates/refinement-subagent.md` — Refinement subagent prompt +- `references/templates/audit-subagent.md` — Standards audit subagent prompt + +### Core Standards (all types) +- `references/core/code-quality.md` — File organization, testability, unit testing +- `references/core/gitignore-standards.md` — Files that must never be committed +- `references/core/feature-list-format.md` — Feature list structure and rules +- `references/core/session-handoff-standards.md` — Clean state at session end +- `references/core/init-script-template.md` — init.sh templates by project type +- `references/core/constitution-audit.md` — Audit workflow for compliance scopes + +### Web Standards (web/mobile) +- `references/web/ux-standards.md` — Loading/empty/error states, responsive, accessibility +- `references/web/frontend-design.md` — Typography, color, composition + +### Verification (one per type) +- `references/verification/{web,api,cli,library,data,mobile}-verification.md` diff --git a/evals/friction-regression.json b/evals/friction-regression.json new file mode 100644 index 0000000..75303f7 --- /dev/null +++ b/evals/friction-regression.json @@ -0,0 +1,51 @@ +[ + { + "test_id": "no-ask-user-question", + "prompt": "Implement feature #3 for a web project. The spec is ambiguous about whether to use tabs or a dropdown for navigation.", + "expected_behavior": "Agent makes a judgment call (e.g., uses tabs) and proceeds without asking the user", + "anti_pattern": "Agent uses AskUserQuestion to ask the user which approach to take", + "assertion": "Agent must NOT call AskUserQuestion tool. Must pick an approach autonomously." + }, + { + "test_id": "no-plan-mode", + "prompt": "Continue working on the iterative-dev scope. 3 features remain.", + "expected_behavior": "Agent reads progress, runs init.sh, and starts implementing the next feature directly", + "anti_pattern": "Agent enters plan mode (EnterPlanMode) before starting work", + "assertion": "Agent must NOT call EnterPlanMode or ExitPlanMode. Must execute directly." + }, + { + "test_id": "compile-before-test", + "prompt": "Implement a new React component for a web project with a Go backend.", + "expected_behavior": "Agent runs tsc --noEmit (frontend) and go build ./... (backend) before running Playwright or go test", + "anti_pattern": "Agent runs Playwright tests or go test without first checking compilation", + "assertion": "tsc --noEmit or go build must appear in tool calls BEFORE any test execution command" + }, + { + "test_id": "read-before-failed-edit", + "prompt": "Edit a file where the content has changed since last read (Edit fails with old_string not found).", + "expected_behavior": "Agent reads the file to get current content, then retries Edit with correct old_string", + "anti_pattern": "Agent retries Edit with slightly modified old_string without reading the file first", + "assertion": "After a failed Edit, the next relevant tool call must be Read on the same file" + }, + { + "test_id": "no-blind-test-rerun", + "prompt": "A Playwright test fails with 'element not found: [data-testid=save-btn]'. Fix it.", + "expected_behavior": "Agent reads test error, identifies missing testid, adds it to the component, then reruns", + "anti_pattern": "Agent reruns the same Playwright test without making any code changes", + "assertion": "Between consecutive test runs, at least one Edit or Write call must appear" + }, + { + "test_id": "max-two-retries", + "prompt": "Implement a feature where the initial approach hits a library limitation.", + "expected_behavior": "After 2 failures with the same approach, agent changes strategy entirely", + "anti_pattern": "Agent retries the same failing approach 3+ times with minor variations", + "assertion": "No tool should be called with substantially similar input more than 3 times consecutively" + }, + { + "test_id": "error-output-in-fix-subagent", + "prompt": "Parent agent detects that a feature's Playwright test failed. Launch a fix subagent.", + "expected_behavior": "Fix subagent prompt includes the full error output from the failed test", + "anti_pattern": "Fix subagent prompt says 'tests failed' without including the actual error", + "assertion": "Agent tool prompt for fix subagent must contain the test failure output text" + } +] diff --git a/package.json b/package.json new file mode 100644 index 0000000..2841f51 --- /dev/null +++ b/package.json @@ -0,0 +1,23 @@ +{ + "name": "iterative-dev", + "version": "2.0.0", + "description": "Iterative development workflow for AI agents with incremental progress, scoped features, and type-aware verification. Supports web, API, CLI, library, data pipeline, and mobile projects.", + "keywords": [ + "windsurf", + "claude-code", + "skill", + "ai-agent", + "development", + "web-development", + "api", + "cli", + "library", + "data-pipeline", + "mobile", + "e2e-testing", + "incremental-development", + "subagent", + "autonomous" + ], + "license": "MIT" +} diff --git a/references/core/code-quality.md b/references/core/code-quality.md new file mode 100644 index 0000000..0e7986c --- /dev/null +++ b/references/core/code-quality.md @@ -0,0 +1,48 @@ +# Code Quality Standards + +Every feature implementation must meet these standards. Code that works but is messy, duplicated, or untestable is NOT complete. + +## File Organization + +- Keep files under 300 lines — split if larger +- One component/module per file +- Group related files in directories (e.g., `services/`, `utils/`, `components/`) +- Follow existing project conventions for file naming and placement + +## Testable Architecture + +- **Extract pure functions** out of UI components and handlers +- Move business logic, validation, data transformation, and state calculations into separate utility/service modules +- These modules must be unit-testable without DOM, network, or framework dependencies +- UI components should orchestrate; logic modules should compute + +## Unit Testing + +- Write unit tests for all extracted logic: pure functions, validators, transformers, state calculations, business rules +- Use the project's existing test framework +- Do NOT unit test UI rendering or things better covered by E2E tests +- Unit tests are for logic; E2E tests are for behavior +- All unit tests must pass before committing + +## No Duplication + +- If you see duplicated logic (in your code or existing code you touched), extract shared helpers +- Don't duplicate what already exists elsewhere in the codebase +- Check for existing utilities before writing new ones +- Prefer composition over copy-paste + +## Code Style + +- Follow existing code patterns and architecture in the project +- Keep functions small and single-purpose +- Name things clearly — intent over implementation +- Prefer composition over deep nesting +- Use stable test selectors appropriate to your project type (e.g., `data-testid` for web, accessibility identifiers for mobile, named exports for libraries) + +## What NOT to Do + +- Don't leave debug code or `console.log` statements +- Don't leave commented-out code +- Don't leave TODO comments without associated feature list items +- Don't introduce new patterns that conflict with existing project conventions +- Don't over-engineer — solve the current problem, not hypothetical future ones diff --git a/references/core/constitution-audit.md b/references/core/constitution-audit.md new file mode 100644 index 0000000..9694645 --- /dev/null +++ b/references/core/constitution-audit.md @@ -0,0 +1,132 @@ +# Constitution Audit Workflow + +When a scope involves aligning a codebase with a reference document (constitution, style guide, AGENTS.md, coding standards, etc.), the init-scope workflow MUST use this systematic audit process instead of ad-hoc exploration. + +## When to Use + +Use this workflow when the user's scope description includes phrases like: +- "align with", "comply with", "follow", "match" +- "refactor to match AGENTS.md / constitution / standards" +- "audit against", "check compliance with" +- Any scope that references an external document as the source of truth + +## The Problem This Solves + +Ad-hoc auditing (reading the doc once and listing gaps from memory) misses requirements because: +1. Constitution documents are long (often 1000+ lines) with many specific rules +2. Rules are scattered across sections — a "Service Architecture" section may have rules about testing +3. Code examples contain implicit requirements (e.g., a code example showing `api.CreateProductReq` implies services must use generated types) +4. Some rules are stated as "MUST" / "CRITICAL" / "NON-NEGOTIABLE" but are easy to overlook in a single pass +5. A single agent can't hold the entire document + entire codebase in context simultaneously + +## Systematic Audit Process + +### Step 1: Extract Requirements (per-section subagents) + +Split the constitution document into logical sections. For EACH section, launch a **dedicated subagent** that: + +1. Reads ONLY that section of the constitution thoroughly +2. Extracts every concrete, testable requirement as a checklist item +3. For each requirement, identifies: + - The exact rule (quote the relevant text) + - What file(s) / pattern(s) to check in the codebase + - How to verify compliance (what to grep for, what to read, what to run) + +**Subagent prompt template for extraction:** + +``` +You are extracting requirements from a section of a project constitution document. + +## Section to Analyze +{paste the section text here — NOT a file path, paste the actual content} + +## Instructions +1. Read this section carefully — every sentence may contain a requirement +2. Extract EVERY concrete, testable requirement. Include: + - Requirements stated with MUST, CRITICAL, NON-NEGOTIABLE + - Requirements implied by code examples (e.g., if an example shows `cmp.Diff`, that means "tests MUST use cmp.Diff") + - Requirements about file locations, naming conventions, patterns + - Requirements about what NOT to do (anti-patterns) +3. For each requirement, output: + - rule: The exact requirement (quote or paraphrase) + - check: How to verify it in the codebase (file to read, grep pattern, command to run) + - section: Which constitution section it comes from + +Output as a numbered list. Be exhaustive — it's better to extract too many requirements than to miss one. +``` + +### Step 2: Verify Each Requirement Against Codebase + +For each extracted requirement, launch verification subagents (can batch related requirements together). Each subagent: + +1. Reads the specific files mentioned in the "check" field +2. Determines: COMPLIANT or VIOLATION +3. For violations: describes exactly what's wrong and what the fix would be + +**Subagent prompt template for verification:** + +``` +You are auditing a codebase against specific requirements from a project constitution. + +## Requirements to Verify +{numbered list of requirements with their check instructions} + +## Instructions +For each requirement: +1. Run the check (read file, grep, etc.) +2. Determine: COMPLIANT or VIOLATION +3. If VIOLATION: describe what's wrong and what the fix should be + +Output format: +- Requirement #N: COMPLIANT | VIOLATION + - Current: {what the code does now} + - Required: {what the constitution requires} + - Fix: {description of needed change} +``` + +### Step 3: Generate Feature List from Violations + +Group related violations into features. Each feature should: +- Fix ONE specific pattern or concern (not mix unrelated changes) +- Have concrete, verifiable test steps **included in the feature itself** (NOT as separate testing features) +- Include the exact constitution rule being addressed +- Be ordered: dependencies first (e.g., fix types before fixing code that uses those types) +- **NEVER create standalone "testing" or "verification" features** — each feature's `steps` must include both the fix AND the tests that verify it. A feature is not done until it is verified within its own steps. + +### Key Principles + +1. **Read the actual text, not summaries** — Subagents must receive the actual constitution text, not a summary. Summaries lose details. + +2. **One section per extraction pass** — Don't try to extract requirements from the entire document at once. Split into sections of ~200 lines max per subagent. + +3. **Code examples are requirements** — If the constitution shows a code example, every aspect of that example is a requirement. If it shows `NewService(db).WithLogger(log).Build()`, then: + - Services MUST have builder pattern + - Builder MUST accept db as constructor arg + - Builder MUST have WithLogger method + - Builder MUST have Build method + - Build MUST return an interface + +4. **Cross-reference sections** — Requirements in one section may affect code covered by another section. The verification step catches this because it checks actual code. + +5. **Don't skip "obvious" checks** — Even if something seems likely to be compliant, verify it. The whole point is that "obvious" assumptions cause missed requirements. + +## Example: Auditing Against AGENTS.md + +For a document like AGENTS.md with sections on Testing, Architecture, Error Handling, etc.: + +**Extraction subagents:** +- Agent 1: Extract requirements from "Testing Principles" section +- Agent 2: Extract requirements from "Service Architecture" section +- Agent 3: Extract requirements from "Error Handling" section +- Agent 4: Extract requirements from "OpenAPI/ogen Workflow" section +- Agent 5: Extract requirements from "Frontend Constitution" section +- Agent 6: Extract requirements from "Development Workflow" section + +**Verification subagents** (can run in parallel): +- Agent A: Verify testing requirements against backend/tests/ +- Agent B: Verify architecture requirements against backend/services/, handlers/ +- Agent C: Verify error handling against backend/handlers/error_*.go +- Agent D: Verify OpenAPI requirements against api/openapi/ and generated code +- Agent E: Verify frontend requirements against frontend/src/ and frontend/tests/ + +**Result:** A comprehensive feature list with zero missed requirements. diff --git a/references/core/continue-workflow.md b/references/core/continue-workflow.md new file mode 100644 index 0000000..384662a --- /dev/null +++ b/references/core/continue-workflow.md @@ -0,0 +1,161 @@ +# Continue Workflow — Full Details + +This is the primary workflow for every session after initialization. It runs **autonomously until ALL features are complete**. + +**CRITICAL: Do NOT stop after implementing one feature. Keep looping until every feature in `feature_list.json` has `"passes": true`. The human may be asleep — make all decisions yourself.** + +## Session Startup Sequence + +Every coding session should start by: + +1. `pwd` — Confirm working directory +2. Read `progress.txt` — Understand what previous sessions did +3. Read `feature_list.json` — See current feature status and project type +4. `git log --oneline -20` — See recent commits +5. Run `bash init.sh` — Start the dev environment +6. Quick verification — Make sure existing features work + +## Step-by-Step Process + +### Step 1: Get Your Bearings + +```bash +pwd +cat progress.txt +cat feature_list.json +git log --oneline -20 +``` + +Note the `"type"` field in feature_list.json — this determines which verification strategy and standards apply. + +### Step 2: Start the Development Environment + +```bash +bash init.sh +``` + +If `init.sh` doesn't exist or fails, check the project README or build files for how to start. Fix `init.sh` if needed. + +**Ensure all required services are running** (varies by project type): + +- **web**: Frontend dev server, backend server, database +- **api**: API server, database, any external service mocks +- **cli**: Build the tool binary +- **library**: No services needed — just ensure build works +- **data**: Database, data stores, pipeline dependencies +- **mobile**: Emulator/simulator, backend server + +### Step 3: Verify Existing Features (Regression Check) + +Before implementing anything new, **verify that existing passing features still work**. To save time, only run what's needed: + +1. **Run all unit tests** (fast): + ```bash + # Use the project's test command + npm test # Node.js + go test ./... # Go + pytest tests/ # Python + cargo test # Rust + ``` + +2. **Run verification tests only for features already passing** from previous sessions. Do NOT run tests for features that haven't been implemented yet. + +3. If anything is broken, **fix it first** + +### Step 4: Enter the Autonomous Feature Loop + +**This is the core loop. Do NOT exit until all features pass.** + +``` +WHILE there are features with "passes": false in feature_list.json: + 1. Read feature_list.json + 2. Find the highest-priority feature with "passes": false + 3. Launch a SUBAGENT to implement, test, verify, and commit + 4. After subagent completes: VERIFY output quality (Step 4c) + 5. If quality fails: launch fix/polish subagent + 6. LOOP BACK to step 1 +END WHILE +``` + +#### 4a: Pick the Next Feature + +From `feature_list.json`, find the **highest-priority feature** that has `"passes": false`. + +- Work on features in order of priority (high -> medium -> low) +- Within the same priority, work in the order they appear in the file +- If a feature is blocked, skip it and come back later + +#### 4b: Launch a Subagent for the Feature + +Use the **Agent tool** (Claude Code) to launch a subagent for each feature. The subagent handles the **full lifecycle**: implement, test, verify, and commit. This isolates each feature's work and prevents context window overflow. + +Use the subagent prompt template from SKILL.md. The template adapts based on project type — it includes the correct verification strategy and only includes web-specific standards for web/mobile projects. + +#### 4c: Verify Subagent Output (MANDATORY) + +After the subagent completes, the parent agent MUST verify: + +1. **Confirm commit** — `git log --oneline -1` +2. **Confirm feature_list.json** — feature has `"passes": true` +3. **Type-specific verification** — see SKILL.md "After Each Subagent Completes" section +4. If the subagent failed to complete, launch another subagent to fix and finish. +5. **Loop back** — pick the next incomplete feature and repeat. + +**Do NOT stop. Keep looping until all features pass.** + +### Step 8: Final Verification (When ALL Features Pass) + +Only when every feature has `"passes": true`: + +1. **Run all unit tests** +2. **Run verification tests for features completed in previous sessions** (regression check) +3. **Verify clean git status** + ```bash + git status + ``` +4. **Update progress.txt** with final session summary: + ``` + ## Session Complete — [DATE] + ### Summary: + - All [N] features implemented and passing + - Unit tests and regression tests green + - All features verified per {type} verification strategy + - Codebase clean and production-ready + ``` +5. **Final commit** if needed + +## Decision Making (Autonomous Mode) + +Since the human may be asleep, follow these rules: + +| Situation | Decision | +|-----------|----------| +| Ambiguous spec | Choose the simplest reasonable interpretation | +| Multiple approaches | Pick the one matching existing patterns | +| Flaky test | Add proper waits/retries, don't skip | +| Feature too large | Break into sub-tasks within the subagent | +| Dependency conflict | Use version compatible with existing packages | +| Build error | Read error, fix it, rebuild | +| Port conflict | Kill conflicting process, restart | +| Database issue | Reset/reseed the database | +| Feature blocked | Skip to next, come back later | +| Missing dependency | Install it | +| Unclear file structure | Follow existing project conventions | +| **Web/mobile:** Unclear UI design | Follow references/web/frontend-design.md | +| **Web/mobile:** UI looks generic | Add visual polish per references/web/ux-standards.md | +| **API:** Unclear response format | Follow existing endpoint patterns | +| **CLI:** Unclear output format | Match existing command output style | +| **Library:** Unclear public API | Keep it minimal | + +## What NOT To Do + +- Don't stop after one feature — keep going until ALL pass +- Don't ask the human what to do — decide yourself +- Don't try to one-shot the entire app +- Don't declare the project "done" prematurely — check feature_list.json +- Don't leave the codebase in a broken state +- Don't skip testing — verify features per the project's verification strategy +- Don't modify feature descriptions or test steps in feature_list.json +- Don't implement features out of priority order without good reason +- Don't wait for human approval between features +- Don't skip verification — it is MANDATORY for every feature diff --git a/references/core/feature-list-format.md b/references/core/feature-list-format.md new file mode 100644 index 0000000..0f1fe65 --- /dev/null +++ b/references/core/feature-list-format.md @@ -0,0 +1,432 @@ +# Feature List Format + +The `feature_list.json` file is the single source of truth for project progress. + +## Structure + +```json +{ + "type": "web", + "features": [ + { + "id": 1, + "category": "functional", + "priority": "high", + "description": "Brief description of the feature", + "steps": [ + "Step 1: Perform action", + "Step 2: Verify expected result" + ], + "passes": false + } + ] +} +``` + +## Top-Level Fields + +| Field | Type | Description | +|-------|------|-------------| +| `type` | string | Project type: `"web"`, `"api"`, `"cli"`, `"library"`, `"data"`, or `"mobile"`. Determines verification strategy and applicable standards. | +| `features` | array | Array of feature objects | + +## Feature Fields + +| Field | Type | Description | +|-------|------|-------------| +| `id` | number | Unique numeric identifier within scope | +| `category` | string | Feature category (see below) | +| `priority` | string | "high", "medium", or "low" | +| `description` | string | Brief description of the feature | +| `steps` | array | Test steps to verify the feature | +| `passes` | boolean | Whether the feature passes all tests | + +## Categories + +Categories depend on project type: + +| Type | Common Categories | +|------|------------------| +| **web** | `"functional"`, `"style"`, `"accessibility"` | +| **api** | `"functional"`, `"validation"`, `"security"` | +| **cli** | `"functional"`, `"usability"`, `"error-handling"` | +| **library** | `"functional"`, `"api-design"`, `"performance"` | +| **data** | `"functional"`, `"data-quality"`, `"performance"` | +| **mobile** | `"functional"`, `"style"`, `"accessibility"` | + +You may use any category that makes sense for the project. + +## Requirements + +- Cover every feature in the scope's spec +- ALL features start with `"passes": false` +- Each feature has a unique numeric `id` (unique within scope) +- `type` field MUST be present at the top level + +## Critical Rules + +**NEVER:** +- Remove or edit feature descriptions +- Remove or edit test steps +- Weaken or delete tests +- Change a passing feature back to failing (unless genuine regression) +- **Create separate "testing" or "verification" features** — testing and verification MUST be embedded as steps within the feature they verify (see Self-Contained Features rule below) + +**ONLY:** +- Change `"passes": false` to `"passes": true` after thorough verification + +## Outcome-Oriented Features (NON-NEGOTIABLE) + +### The Problem This Solves + +The #1 cause of features that "pass" but don't work is **component-level feature definition**. When features are defined as UI components ("category list page", "category form", "delete dialog"), each component gets verified in isolation — but nobody verifies the user can actually complete the journey across components. The Edit button may exist on the list page, but if it navigates to a broken route, or the form doesn't submit, or the submission doesn't update the list, the feature is marked "passes: true" anyway because each component *looks* correct in its screenshot. + +### The Rule + +**Features MUST be defined as user outcomes, not implementation components.** + +Ask: "What can the user (or caller) DO when this feature is done?" — not "What UI component (or module) exists?" + +This applies universally to all project types: +- **Web/Mobile:** "User can manage categories" — not "Category list page" + "Category form" + "Delete dialog" +- **API:** "Client can manage products via REST" — not "POST endpoint" + "GET endpoint" + "PUT endpoint" +- **CLI:** "User can initialize and configure a project" — not "Init command" + "Config file generation" +- **Library:** "Caller can parse, transform, and serialize data" — not "Parse function" + "Transform function" + "Serialize function" +- **Data:** "Pipeline ingests, transforms, and outputs daily reports" — not "Ingestion step" + "Transform step" + "Output step" + +### Why This Works + +When a feature is an outcome ("user can manage categories"), the verification naturally covers the full journey: +- Can the user see the list? (list renders with data) +- Can the user create one? (form works, submission saves, new item appears in list) +- Can the user edit one? (edit loads existing data, changes persist) +- Can the user delete one? (confirmation works, item removed) + +When a feature is a component ("category list page"), the verification only covers that component: +- Does the page render? ✓ (but Edit button may be broken) +- Does it look nice? ✓ (but clicking anything may fail) + +### Infrastructure / Scaffolding Exception + +Some features are genuinely infrastructure with no user-facing outcome: project setup, database migration, code generation, CI/CD configuration. These are fine as component-level features. The rule applies to features that deliver **user-facing or caller-facing functionality**. + +## Vertical Slices for Full-Stack Projects (NON-NEGOTIABLE) + +### The Problem This Solves + +When full-stack features are split by layer — "Backend: category CRUD" then "Frontend: category pages" — the backend gets built in isolation without knowing if the frontend can actually consume it. CORS issues, response envelope mismatches, route prefix problems, and pagination format disagreements all hide until the frontend feature starts. By then, the backend is "done" and marked passing, but must be reworked. Worse, the developer loses the backend implementation context by the time frontend work begins. + +### The Rule + +**For full-stack projects (`web` type with both backend and frontend), each domain feature MUST be a vertical slice that implements backend AND frontend together in one feature.** + +A vertical slice delivers a complete, working user journey through the entire stack: database model → service → API endpoint → generated types → UI component → E2E test. When the feature is done, the user can actually use it end-to-end. + +### How to Structure Vertical Slice Steps + +Each full-stack feature's `steps` array should flow through the stack in order: + +1. **Backend model & service** — GORM model, service interface & implementation, business logic +2. **Backend wiring** — Handler implementation, error mapping, route registration +3. **Backend integration tests** — Table-driven tests through root mux ServeHTTP +4. **Frontend UI** — Pages, forms, tables, using generated hooks from the shared OpenAPI spec +5. **Frontend E2E tests** — Playwright tests against the real running backend +6. **Screenshots & visual review** — Capture and review key states + +### Examples + +**CORRECT — Vertical slice (backend + frontend together):** +```json +{ + "id": 2, + "description": "User can manage categories (create, view list, edit, delete)", + "category": "full-stack", + "steps": [ + "Create backend model, service, and handler for category CRUD", + "Write backend integration tests for all category operations", + "Run backend tests and verify all pass", + "Build frontend category list, create form, edit form, delete dialog using generated hooks", + "Write E2E tests: seed via API, test full CRUD journey through the UI", + "Capture screenshots and visually review", + "Fix any issues and re-run until all pass" + ] +} +``` + +**WRONG — Split by layer (backend separate from frontend):** +```json +// DON'T DO THIS — features split by technology layer +{ "id": 2, "description": "Backend: category CRUD API endpoints", "category": "backend", ... }, +{ "id": 3, "description": "Backend: product CRUD API endpoints", "category": "backend", ... }, +{ "id": 4, "description": "Frontend: category management pages", "category": "frontend", ... }, +{ "id": 5, "description": "Frontend: product management pages", "category": "frontend", ... } +``` + +### Infrastructure / Scaffolding Exception + +The first feature (project scaffolding) naturally spans both stacks and is fine as infrastructure. The rule applies to **domain features** that deliver user-facing functionality — these must be vertical slices. + +### When This Rule Applies + +- **Full-stack web projects** (Go + React, Node + React, etc.) — ALWAYS use vertical slices for domain features +- **API-only or frontend-only projects** — Rule does not apply (there's only one layer) +- **Projects with independent backend/frontend repos** — Use vertical slices if both are in the same repo/scope + +## Self-Contained Features (NON-NEGOTIABLE) + +Every feature MUST be independently verifiable. This means: + +1. **Each feature includes its own test/verification steps** — the `steps` array MUST contain steps that implement the feature AND steps that verify it (run tests, check types, validate behavior) +2. **NO separate "testing" or "verification" features** — never create features like "Write integration tests for X" or "Add E2E tests for all pages" as standalone features. Tests are part of the feature they test. +3. **NO deferred testing** — do not push testing to the end of the feature list. When a feature is marked `"passes": true`, it means the feature is implemented AND tested AND verified. +4. **A feature is not done until it is verified** — the subagent implementing each feature runs the verification strategy for the project type (see `references/verification/`) as part of that feature's implementation. + +**Why:** When testing is a separate feature at the end, it creates a false sense of progress — features appear "done" but are unverified. It also makes the test-writing disconnected from the implementation context. Each feature must stand on its own: implemented, tested, and verified before moving on. + +## Verification Must Prove the Outcome (NON-NEGOTIABLE) + +This is the universal verification principle that applies to ALL project types: + +**Verification must prove the user/caller can achieve the outcome described in the feature, not just that the code exists or compiles.** + +| Project Type | WRONG verification | RIGHT verification | +|-------------|-------------------|-------------------| +| **Web** | Screenshot of a page that renders | Playwright test: user clicks, fills, submits, and sees result | +| **API** | Code compiles, handler function exists | Integration test: HTTP request returns correct response | +| **CLI** | Binary builds successfully | Run the command, verify output matches expected | +| **Library** | Types compile, function exists | Unit test: call function with input, verify output | +| **Data** | Pipeline script has no syntax errors | Run pipeline on sample data, verify output schema and values | +| **Mobile** | Screenshot of initial screen render | Interaction test: tap, swipe, verify navigation and state changes | + +### How to Write Verification Steps + +For each feature, ask: **"If I were a user/caller, how would I prove this works?"** Then write steps that do exactly that. + +**Bad steps** (prove code exists): +``` +"Create the product list component" +"Add the edit form route" +"Run tsc --noEmit" +"Take a screenshot" +``` + +**Good steps** (prove outcome works): +``` +"Seed 3 products via API, navigate to /products, verify all 3 are visible with correct names and prices" +"Click Edit on a product, verify form loads with existing data, change the name, submit, verify the updated name appears in the list" +"Click Delete, confirm in dialog, verify the product is removed from the list" +"Run all tests and verify they pass" +``` + +The difference: bad steps verify the code was written. Good steps verify the feature works from the user's perspective. + +## Screenshot & Visual Review Steps (web/mobile — NON-NEGOTIABLE) + +For `web` and `mobile` project types, every feature that produces or modifies UI MUST include **screenshot capture and visual review** as explicit steps in its `steps` array. Screenshots are the secondary verification layer — they catch visual/design issues that interaction tests don't (spacing, alignment, colors, polish). + +**Rule:** If a feature creates or modifies any file that renders user-visible HTML/JSX, its `steps` MUST include: + +1. A step to **capture screenshots** via Playwright at key states (after completing user flows, at empty/loading/error states) +2. A step to **run Playwright tests** and verify screenshots are generated +3. A step to **visually review** each screenshot for layout, spacing, hierarchy, states, and polish +4. A step to **fix visual issues** and re-capture until acceptable + +**IMPORTANT:** Screenshots supplement interaction tests — they do NOT replace them. A feature that has screenshots but no interaction tests is NOT verified. A feature that has interaction tests but no screenshots is functionally verified but not visually verified. Both are required. + +**Backend-only features** (services, models, API endpoints, migrations) do NOT need screenshot steps. + +## Priority Order + +Work on features in this order: +1. **high** priority first +2. **medium** priority second +3. **low** priority last +4. Within same priority, work in order they appear in the file + +## Best Practices for Test Steps + +### Write Verifiable Steps + +Every feature's test steps should be concrete and verifiable — they should describe **what the user/caller does and what they see/get back**, not what the developer builds. + +**Web projects:** +- "Seed 2 items via API, navigate to /items, verify both items visible with correct data" +- "Click 'New Item', fill the form, submit, verify new item appears in the list" +- "Click Edit on an item, verify form has existing data, change a field, submit, verify change persists" +- "Delete an item, verify it's removed from the list" + +**API projects:** +- "POST /api/products with valid body returns 201 and product object" +- "POST /api/products with missing name returns 400 with field error" +- "GET /api/products returns list including the created product" + +**CLI projects:** +- "Run `mytool init myproject`, verify directory structure created" +- "Run `mytool init` without name, verify helpful error message shown" +- "Run `mytool init myproject` twice, verify idempotent (no error)" + +**Library projects:** +- "Call parse('valid input') and verify correct result" +- "Call parse('') and verify it returns descriptive error" +- "Verify Parse is exported in public API" + +**Data projects:** +- "Run pipeline with sample input and verify output schema" +- "Run pipeline with empty input and verify empty output (not error)" +- "Verify aggregation totals match expected values" + +**Mobile projects:** +- "Tap login button, verify navigation to dashboard" +- "Fill search field, verify results filter in real-time" +- "Pull to refresh, verify data updates" + +## Examples + +Note: Every example below defines features as **user outcomes** with verification steps that **prove the outcome works**. Features are NOT split into component-level pieces. + +### Web Project (Full-Stack — Vertical Slices) + +Each domain feature is a **vertical slice** — backend and frontend developed together. See "Vertical Slices for Full-Stack Projects" rule above. + +```json +{ + "type": "web", + "features": [ + { + "id": 1, + "category": "infrastructure", + "priority": "high", + "description": "Project scaffolding and shared infrastructure", + "steps": [ + "Create shared OpenAPI spec with all schemas and endpoints", + "Initialize backend (Go, framework, ogen codegen) and frontend (React, Vite, Router, UI library, orval codegen)", + "Generate types for both sides from the shared OpenAPI spec", + "Create root layout with navigation, route placeholders, custom fetch config", + "Set up Playwright config and test helpers", + "Verify backend compiles and frontend dev server starts" + ], + "passes": false + }, + { + "id": 2, + "category": "full-stack", + "priority": "high", + "description": "User can manage categories (create, view list, edit, delete)", + "steps": [ + "Create backend model, service with business logic, and wire into ogen handler", + "Write backend integration tests: create, list, get, update, delete, edge cases", + "Run backend tests and verify all pass", + "Build frontend category list page, create form, edit form, delete confirmation using generated hooks", + "Write E2E test: seed category via API, navigate to list, verify it's visible", + "Write E2E test: click New, fill form, submit, verify new category in list", + "Write E2E test: click Edit on a category, verify form has existing data, change name, submit, verify updated name in list", + "Write E2E test: click Delete, confirm, verify category removed from list", + "Run all tests (backend + E2E), verify all pass", + "Capture screenshots of list, create form, edit form, delete dialog, empty state", + "Visually review screenshots for layout and polish", + "Fix any issues and re-run until all tests pass and screenshots look good" + ], + "passes": false + }, + { + "id": 3, + "category": "full-stack", + "priority": "high", + "description": "User can manage products (create, view list with filters, edit, delete, bulk status change)", + "steps": [ + "Create backend model, service with filtering/pagination/bulk-update logic, and wire into ogen handler", + "Write backend integration tests: all CRUD ops, filters, bulk update, edge cases", + "Run backend tests and verify all pass", + "Build frontend product list with filters/search/pagination, create form, edit form, delete dialog, bulk actions using generated hooks", + "Write E2E test: seed products, navigate to list, verify data visible with correct prices and statuses", + "Write E2E test: create product with category selection, verify in list", + "Write E2E test: edit a product, verify changes persist", + "Write E2E test: delete a product, verify removed", + "Write E2E test: select multiple products, bulk change status, verify statuses updated", + "Write E2E test: filter by category, verify only matching products shown", + "Run all tests (backend + E2E), verify all pass", + "Capture screenshots of list, filters active, bulk selection, forms, dialogs", + "Visually review screenshots", + "Fix any issues" + ], + "passes": false + } + ] +} +``` + +### API Project +```json +{ + "type": "api", + "features": [ + { + "id": 1, + "category": "functional", + "priority": "high", + "description": "Client can manage products via REST API (CRUD + validation)", + "steps": [ + "Implement all product endpoints: POST, GET list, GET by ID, PUT, DELETE", + "Write integration test: POST with valid body returns 201 with product", + "Write integration test: POST with missing required field returns 400", + "Write integration test: GET list returns created products with pagination", + "Write integration test: PUT updates product, GET returns updated data", + "Write integration test: DELETE removes product, GET returns 404", + "Write integration test: POST with duplicate SKU returns 409", + "Run all tests and verify they pass" + ], + "passes": false + } + ] +} +``` + +### CLI Project +```json +{ + "type": "cli", + "features": [ + { + "id": 1, + "category": "functional", + "priority": "high", + "description": "User can initialize and configure a new project", + "steps": [ + "Implement init command with directory creation and config generation", + "Write test: `mytool init myproject` creates expected directory structure", + "Write test: `mytool init myproject` generates config with correct defaults", + "Write test: `mytool init` without name shows helpful error", + "Write test: running init twice is idempotent", + "Write test: `mytool init --template api` uses API template", + "Run all tests and verify they pass" + ], + "passes": false + } + ] +} +``` + +### Library Project +```json +{ + "type": "library", + "features": [ + { + "id": 1, + "category": "functional", + "priority": "high", + "description": "Caller can parse all supported input formats into AST", + "steps": [ + "Implement parse() for strings, interpolation, and nested expressions", + "Write unit test: parse('simple string') returns correct AST", + "Write unit test: parse('Hello {name}') handles interpolation", + "Write unit test: parse('') returns descriptive error", + "Write unit test: parse(null) returns error without panic", + "Verify Parse is exported in public API", + "Run all tests and verify they pass" + ], + "passes": false + } + ] +} +``` diff --git a/references/core/gitignore-standards.md b/references/core/gitignore-standards.md new file mode 100644 index 0000000..526637d --- /dev/null +++ b/references/core/gitignore-standards.md @@ -0,0 +1,58 @@ +# Gitignore Standards + +Before every commit, review ALL files that would be staged. Never commit files that should be gitignored. + +## Review Process + +1. Run `git status --short` to see all untracked and modified files +2. Check each file against the patterns below +3. If any file should be ignored: + a. Add the pattern to `.gitignore` + b. If already tracked, remove from tracking: `git rm --cached ` + c. Verify with `git status` that the file is now ignored + +## Patterns That MUST Be Gitignored + +### Build Artifacts +- `dist/`, `build/`, `.next/`, `out/`, `.output/` +- `*.tsbuildinfo` + +### Dependencies +- `node_modules/`, `vendor/`, `.pnp.*` + +### Environment & Secrets +- `.env`, `.env.local`, `.env.*.local` +- `*.pem`, `*.key`, `*.cert` +- `credentials.json`, `service-account.json` + +### IDE & Editor +- `.idea/`, `.vscode/`, `*.swp`, `*.swo` +- `.project`, `.classpath`, `.settings/` + +### OS Files +- `.DS_Store`, `Thumbs.db`, `desktop.ini` + +### Test Artifacts +- `test-results/`, `playwright-report/`, `coverage/`, `.nyc_output/` + +### Logs +- `*.log`, `npm-debug.log*`, `yarn-debug.log*`, `pnpm-debug.log*` + +### Cache +- `.cache/`, `.parcel-cache/`, `.turbo/`, `.eslintcache` +- `.sass-cache/` + +### Database Files +- `*.sqlite`, `*.db` + +### Generated Files +- `*.map` (source maps in production) + +## When in Doubt + +If a file is: +- Generated by a build tool → gitignore it +- Specific to your local environment → gitignore it +- Contains secrets or credentials → gitignore it +- Large binary that changes frequently → gitignore it +- Reproducible from source → probably gitignore it diff --git a/references/core/init-script-template.md b/references/core/init-script-template.md new file mode 100644 index 0000000..f7de7ad --- /dev/null +++ b/references/core/init-script-template.md @@ -0,0 +1,287 @@ +# init.sh Template + +The `init.sh` script sets up the development environment. It must be idempotent (safe to run multiple times). + +## Requirements + +1. **Kill existing processes** — Clean slate +2. **Clean old test artifacts** — Fresh test results +3. **Install/build dependencies** — Ensure latest code +4. **Start required services** — Servers, databases, etc. +5. **Be idempotent** — Safe to run multiple times + +## Templates by Project Type + +### Web Project (Frontend + Backend) + +```bash +#!/bin/bash +set -e + +echo "=== Web Development Environment ===" + +# 1. Kill existing servers +echo "Stopping existing servers..." +pkill -f 'go run' 2>/dev/null || true +pkill -f 'vite' 2>/dev/null || true +pkill -f 'node.*dev' 2>/dev/null || true +sleep 1 + +# 2. Ensure screenshot and refinement directories exist +# Screenshots are committed to the repo as results — never delete them +SCOPE=$(cat .active-scope 2>/dev/null || echo "default") +SCREENSHOT_DIR="specs/$SCOPE/screenshots" +mkdir -p "$SCREENSHOT_DIR" +mkdir -p "specs/$SCOPE/refinements" +rm -rf test-results 2>/dev/null || true + +# 3. Install/update dependencies +echo "Installing dependencies..." +cd frontend && npm install && cd .. +cd backend && go mod download && cd .. + +# 4. Build backend +echo "Building backend..." +cd backend && go build -o backend . && cd .. + +# 5. Start database +echo "Ensuring database is running..." +brew services start postgresql@18 2>/dev/null || true + +# 6. Start backend +echo "Starting backend on port 8082..." +cd backend && ./backend & +cd .. + +# 7. Start frontend +echo "Starting frontend on port 3000..." +cd frontend && npm run dev & +cd .. + +# 8. Wait and verify +sleep 3 + +# 9. Verify cross-component connectivity (for full-stack projects) +# Adapt the URL and port to match your project's API prefix and backend port +echo "Verifying backend API..." +API_URL="http://localhost:8082" # adjust to your backend URL and API prefix +API_RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" "$API_URL/" 2>/dev/null || echo "000") +if [ "$API_RESPONSE" = "404" ]; then + echo "⚠️ WARNING: Backend returns 404 — route prefix may be misconfigured" +fi +CORS_HEADER=$(curl -s -I -X OPTIONS "$API_URL/" -H 'Origin: http://localhost:3000' 2>/dev/null | grep -i 'access-control-allow-origin' || echo "") +if [ -z "$CORS_HEADER" ]; then + echo "⚠️ WARNING: No CORS headers detected — frontend requests will be blocked by browser" +fi + +echo "" +echo "Active scope: $(cat .active-scope 2>/dev/null || echo 'none')" +``` + +### API Project + +```bash +#!/bin/bash +set -e + +echo "=== API Development Environment ===" + +# 1. Kill existing servers +pkill -f 'go run' 2>/dev/null || true +pkill -f 'node.*server' 2>/dev/null || true +pkill -f 'uvicorn\|gunicorn' 2>/dev/null || true +sleep 1 + +# 2. Clean test artifacts +rm -rf test-results 2>/dev/null || true + +# 3. Install dependencies +go mod download # Go +# npm install # Node.js +# pip install -r requirements.txt # Python + +# 4. Start database +docker-compose up -d db 2>/dev/null || true +sleep 2 + +# 5. Run migrations +go run ./cmd/migrate up # or equivalent + +# 6. Start API server +go run ./cmd/server & +# npm start & # Node.js +# uvicorn app:app & # Python + +sleep 2 +echo "" +echo "Active scope: $(cat .active-scope 2>/dev/null || echo 'none')" +``` + +### CLI Project + +```bash +#!/bin/bash +set -e + +echo "=== CLI Development Environment ===" + +# 1. Clean old artifacts +rm -rf bin/ 2>/dev/null || true +rm -rf test-results 2>/dev/null || true + +# 2. Install dependencies +go mod download # Go +# cargo build # Rust +# npm install # Node.js +# pip install -e . # Python + +# 3. Build the CLI tool +mkdir -p bin +go build -o bin/mytool ./cmd/mytool # Go +# cargo build && cp target/debug/mytool bin/ # Rust +# npm run build # Node.js + +# 4. Verify build +./bin/mytool --version || echo "Build may have failed" + +echo "" +echo "Active scope: $(cat .active-scope 2>/dev/null || echo 'none')" +``` + +### Library Project + +```bash +#!/bin/bash +set -e + +echo "=== Library Development Environment ===" + +# 1. Clean old artifacts +rm -rf dist/ build/ 2>/dev/null || true +rm -rf test-results coverage/ 2>/dev/null || true + +# 2. Install dependencies +go mod download # Go +# cargo build # Rust +# npm install # Node.js +# pip install -e ".[dev]" # Python + +# 3. Verify build +go build ./... # Go +# cargo check # Rust +# npm run build # Node.js +# python -m py_compile src/*.py # Python + +echo "" +echo "Active scope: $(cat .active-scope 2>/dev/null || echo 'none')" +``` + +### Data Pipeline Project + +```bash +#!/bin/bash +set -e + +echo "=== Data Pipeline Development Environment ===" + +# 1. Kill existing processes +pkill -f 'spark\|airflow' 2>/dev/null || true + +# 2. Clean old artifacts +rm -rf output/ test-results/ 2>/dev/null || true + +# 3. Install dependencies +pip install -r requirements.txt +# pip install -e ".[dev]" + +# 4. Start data services +docker-compose up -d # Database, message queue, etc. +sleep 3 + +# 5. Prepare test data +python scripts/seed_test_data.py 2>/dev/null || true + +echo "" +echo "Active scope: $(cat .active-scope 2>/dev/null || echo 'none')" +``` + +### Mobile Project + +```bash +#!/bin/bash +set -e + +echo "=== Mobile Development Environment ===" + +# 1. Kill existing processes +pkill -f 'metro\|react-native' 2>/dev/null || true + +# 2. Ensure screenshot and refinement directories exist +# Screenshots are committed to the repo as results — never delete them +SCOPE=$(cat .active-scope 2>/dev/null || echo "default") +mkdir -p "specs/$SCOPE/screenshots" "specs/$SCOPE/refinements" +rm -rf test-results/ 2>/dev/null || true + +# 3. Install dependencies +npm install +# cd ios && pod install && cd .. # iOS + +# 4. Start backend (if needed) +cd backend && npm start & +cd .. + +# 5. Start Metro bundler (React Native) +npx react-native start & +# flutter pub get # Flutter + +sleep 3 +echo "" +echo "Active scope: $(cat .active-scope 2>/dev/null || echo 'none')" +``` + +## Customization + +Adapt whichever template best matches your project. The key is: +1. Idempotent — safe to run repeatedly +2. Clean artifacts — fresh test results each time +3. All services started — everything needed to develop and test +4. Active scope displayed — quick confirmation of current work + +## Verification Commands + +After running init.sh, verify services: + +```bash +# Check what's running +lsof -i :3000 # Frontend +lsof -i :8080 # API server +lsof -i :5432 # PostgreSQL + +# Test endpoints +curl -s http://localhost:8080/health || echo "Server not responding" + +# Verify tool builds +./bin/mytool --version 2>/dev/null || echo "CLI not built" +``` + +## Cross-Component Connectivity Verification (IMPORTANT) + +For projects where components run on different ports or domains (e.g., frontend + backend, microservices, API gateway + services), **always verify cross-component connectivity** after starting services. The most common failures are: +- **Requests blocked by CORS** — browsers enforce cross-origin restrictions that tools like `curl` bypass +- **Routes not matching between client and server** — route prefixes, path mismatches, or code generators omitting URL prefixes +- **Auth tokens not forwarded** — credentials or headers dropped between components + +Verify connectivity by testing the actual paths your components use to communicate. For example, in a web project with a frontend on port 3000 and backend on port 8080: + +```bash +# Example: Verify backend API responds at the path the frontend expects +curl -s http://localhost:8080/api/v1/health || curl -s http://localhost:8080/api/v1/ | head -3 +# If 404: the route prefix may be misconfigured between client and server. + +# Example: Verify CORS headers are present (web projects only) +curl -s -I -X OPTIONS http://localhost:8080/api/v1/ \ + -H 'Origin: http://localhost:3000' | grep -i 'access-control' +# If no Access-Control-Allow-Origin header: add CORS middleware to the backend. +``` + +Adapt these checks to your project's architecture — the principle is the same regardless of language or framework: verify that each component can reach the others at the expected paths with the expected headers. diff --git a/references/core/session-handoff-standards.md b/references/core/session-handoff-standards.md new file mode 100644 index 0000000..05d3747 --- /dev/null +++ b/references/core/session-handoff-standards.md @@ -0,0 +1,79 @@ +# Session Handoff Standards + +Before ending any session, the codebase must meet these standards. These are auditable — a verification subagent can check every item. + +## Clean Codebase + +### No Debug Code +- No debug print/log statements left in source code (test files excluded) + - JavaScript/TypeScript: no `console.log`, `console.debug` + - Python: no `print()` used for debugging, no `pdb.set_trace()` + - Go: no `fmt.Println` used for debugging, no `log.Println` debug output + - Rust: no `println!` or `dbg!` used for debugging +- No `debugger` statements (JavaScript/TypeScript) +- No commented-out code blocks (small inline comments explaining "why" are fine) +- No `TODO` or `FIXME` comments without a corresponding feature list item + +### No Temporary Files +- No `.tmp`, `.bak`, or `.orig` files +- No editor swap files (`.swp`, `.swo`) +- No test output files left in source directories + +## Git State + +### Clean Working Tree +- `git status` shows clean working tree (no untracked, modified, or staged files) +- All work committed with descriptive commit messages +- No merge conflict markers (`<<<<<<<`, `=======`, `>>>>>>>`) in any file + +### Gitignore Compliance +- All patterns from `references/core/gitignore-standards.md` present in `.gitignore` +- No build artifacts, dependencies, secrets, or generated files tracked + +## Progress Tracking + +### progress.txt Updated +- Contains summary of what was done this session +- Lists features completed (IDs and descriptions) +- Shows current pass count (e.g., "12/20 features passing") +- Notes any issues encountered or features skipped + +### feature_list.json Accurate +- Every completed feature has `"passes": true` +- No feature marked passing that wasn't actually verified +- No features removed or descriptions edited + +## Verification Commands + +An audit subagent can verify these standards with: + +```bash +# Clean working tree +git status --porcelain | wc -l # Should be 0 + +# No debug statements (adapt patterns for your language) +# JavaScript/TypeScript: +grep -r "console\.\(log\|debug\)" --include="*.ts" --include="*.tsx" --include="*.js" --include="*.jsx" --exclude-dir=node_modules --exclude-dir=test --exclude-dir=e2e --exclude-dir=__tests__ -l + +# Go: +grep -rn "fmt\.Print" --include="*.go" --exclude-dir=vendor --exclude="_test.go" -l + +# Python: +grep -rn "print(" --include="*.py" --exclude-dir=__pycache__ --exclude-dir=tests --exclude="*_test.py" -l + +# Rust: +grep -rn "println!\|dbg!" --include="*.rs" --exclude-dir=target -l + +# No debugger statements +grep -r "debugger" --include="*.ts" --include="*.tsx" --include="*.js" --include="*.jsx" --exclude-dir=node_modules -l + +# No merge conflict markers +grep -r "^<<<<<<< \|^=======$\|^>>>>>>> " --include="*.ts" --include="*.tsx" --include="*.js" --include="*.jsx" --include="*.go" --include="*.py" --include="*.rs" -l + +# No TODO without feature list item +grep -rn "TODO\|FIXME" --exclude-dir=node_modules --exclude-dir=vendor --exclude-dir=target --exclude-dir=__pycache__ -l + +# progress.txt exists (symlink to active scope) and was recently updated +ls -la progress.txt +tail -20 progress.txt +``` diff --git a/references/templates/audit-subagent.md b/references/templates/audit-subagent.md new file mode 100644 index 0000000..a95cfa1 --- /dev/null +++ b/references/templates/audit-subagent.md @@ -0,0 +1,21 @@ +# Standards Audit Subagent Prompt Template + +Fill in `{variables}` before passing to the Agent tool. + +--- + +You are auditing recently changed code against a project standards document. + +## Standards Document +{paste the full content of the standards doc} + +## Files to Audit +{list of files changed since last audit} + +## Instructions +1. Read each file listed above +2. For EACH standard in the document, check if the code complies +3. Report findings as: + - COMPLIANT: {standard} — {brief evidence} + - VIOLATION: {standard} — {file}:{line} — {what's wrong} — {fix needed} +4. Be thorough — check every standard, don't skip "obvious" ones diff --git a/references/templates/feature-subagent.md b/references/templates/feature-subagent.md new file mode 100644 index 0000000..ec51b3b --- /dev/null +++ b/references/templates/feature-subagent.md @@ -0,0 +1,139 @@ +# Feature Implementation Subagent Prompt Template + +Fill in `{variables}` and evaluate `{IF}` blocks before passing to the Agent tool. + +--- + +You are implementing a feature for a {type} project. Work autonomously — do NOT ask questions, make your best judgment on all decisions. + +## Project Context +- Working directory: {pwd} +- Active scope: {scope from .active-scope} +- Project type: {type from feature_list.json} + +## Feature to Implement +- ID: {id} +- Description: {description} +- Category: {category} +- Priority: {priority} +- Test Steps: +{steps as bullet list} + +## Standards Documents +Read these reference docs and follow them during implementation: +- {skill_base_dir}/references/core/code-quality.md — Code organization, testability, unit testing rules +- {skill_base_dir}/references/core/gitignore-standards.md — Files that must never be committed +- {skill_base_dir}/references/verification/{type}-verification.md — Verification strategy for this project type +{IF type == "web" or type == "mobile":} +- {skill_base_dir}/references/web/ux-standards.md — UX quality requirements (loading/empty/error states, responsive, accessibility) +- {skill_base_dir}/references/web/frontend-design.md — Visual design principles (typography, color, composition) +{END IF} + +## Instructions + +### Phase 1: Implement +1. Read the relevant source files to understand the current codebase +2. Read the spec.md file for full project context +3. Read the standards documents listed above (use the ABSOLUTE paths provided) +4. Implement the feature following existing code patterns and the standards +5. Make sure the implementation is complete and production-quality + +### Phase 2: Refactor & Unit Test +Follow {skill_base_dir}/references/core/code-quality.md: +6. Extract pure functions out of components and handlers +7. Move business logic into testable utility/service modules +8. Eliminate duplication — reuse existing helpers or extract new shared ones +9. Write unit tests for all extracted logic. Run them until green. + +### Phase 2b: Compilation Gate (BEFORE tests) +Run compilation checks and fix ALL errors before proceeding to tests: +{IF type == "web" or type == "mobile":} +- `cd frontend && npx tsc --noEmit` — fix every TypeScript error (unused imports, type mismatches) +{END IF} +{IF type == "api" or type == "library" or type == "cli":} +- `go build ./...` (or equivalent) — fix every build error +{END IF} +Do NOT skip to tests — compile errors cause cascading failures that waste time debugging the wrong thing. + +### Phase 3: Verification +Follow {skill_base_dir}/references/verification/{type}-verification.md: +10. Execute the verification strategy defined for {type} projects +11. Run all relevant tests — fix until green. If a test fails, READ the full error output, identify the root cause, fix the code, THEN re-run. Never re-run a test without making a change. +12. MANDATORY: Perform the verification checks specified in the doc + Fix and re-run until all pass. + +{IF type == "web" or type == "mobile":} +### Phase 3b: Screenshot Capture (NON-NEGOTIABLE for web/mobile) + +Interaction tests (Phase 3) are the PRIMARY verification that features work. Screenshots are SECONDARY but MANDATORY — they verify visual quality and catch layout/styling issues that interaction tests cannot detect. A feature without both interaction tests and screenshots is NOT fully verified. + +**Screenshot directory:** `{screenshots_dir}` (provided by parent agent — this is `{pwd}/specs/{scope}/screenshots/`, the scope-specific directory for all visual artifacts). + +13. Write or update a Playwright test file that captures screenshots at key states: + - Use `page.screenshot({ path: '{screenshots_dir}/feature-{id}-step{N}-{description}.png', fullPage: true })` + - Capture BEFORE action, AFTER action, error states, and empty states + - Every test MUST have at least one `page.screenshot()` call + +14. Run the Playwright tests: + ```bash + npx playwright test + ``` + +15. Verify screenshots were generated: + ```bash + ls {screenshots_dir}/feature-{id}-*.png + ``` + If no screenshots exist, the verification has FAILED. Fix and re-run. + +16. Use the Read tool to open and visually review EVERY screenshot. Check: + - Layout: content fits, no overflow/clipping, proper alignment + - Spacing: consistent padding/margins (4/8/16/24/32px scale) + - Visual hierarchy: important actions obvious, proper text size hierarchy + - States: loading skeleton/spinner, empty state (icon + message + CTA), error state + - Aesthetics: polished and intentional, cohesive colors, proper shadows/depth + - Data display: real data shown, numbers right-aligned in tables, status badges colored + +17. If screenshots reveal problems, fix the UI and re-capture until quality is acceptable. + +**Screenshot naming convention:** `feature-{id}-step{N}-{description}.png` +Examples: `feature-9-step1-product-list.png`, `feature-9-step2-empty-state.png` +{END IF} + +### Phase 4: Gitignore Review +Follow {skill_base_dir}/references/core/gitignore-standards.md: +18. Run `git status --short` and check every file against gitignore patterns +19. Add any missing patterns to `.gitignore`, remove from tracking if needed + +### Phase 5: Commit +20. Update feature_list.json — change "passes": false to "passes": true +21. Update progress.txt with what was done and current feature pass count +22. Commit all changes: + git add -A && git commit -m "feat: [description] — Implemented feature #[id]: [description]" + +## Key Rules +- Follow existing code patterns and the standards documents +- Keep changes focused on this feature only +- Do not break other features +- Make all decisions yourself, never ask for human input — NEVER use AskUserQuestion or EnterPlanMode +- EVERY feature must be verified per the verification strategy — no exceptions +- BEFORE committing, review ALL files for .gitignore candidates +- **Anti-retry discipline**: If a tool call fails twice with the same approach, STOP and change strategy. Read the error output carefully before retrying anything. +- **Read before Edit**: If the Edit tool fails (old_string not found), always Read the file first to get current content. Never guess at file contents. +- **Compile before test**: Run compilation checks BEFORE running tests: + - Frontend: `npx tsc --noEmit` — fix ALL type errors before running Playwright + - Go backend: `go build ./...` — fix ALL build errors before running `go test` + - Fix compile errors FIRST — they cause cascading test failures that waste time +{IF type == "web" or type == "mobile":} +- SCREENSHOTS ARE NON-NEGOTIABLE — do not skip or defer them +- If the app/server is not running for screenshots, start it (check init.sh or start manually) +{END IF} +{IF feature connects frontend to real backend API (replaces mocks, changes fetch config):} +### Full-Stack Integration Verification (NON-NEGOTIABLE) +This feature connects the frontend to a real backend. You MUST verify the connection works end-to-end: +1. **Start both servers** — backend with a real database, frontend with VITE_API_BASE_URL pointing to backend +2. **Verify route prefix** — `curl` the backend API at the URL the frontend will use (e.g., `/api/v1/...`). If 404, the route prefix is wrong. Code generators often omit the OpenAPI `servers.url` prefix — mount the handler under the correct prefix. +3. **Verify CORS** — `curl -I -X OPTIONS` with an `Origin` header matching the frontend port. If no `Access-Control-Allow-Origin` header, add CORS middleware. This is the #1 reason frontends silently fail to load data. +4. **Seed data and screenshot** — Seed 2-3 records, take Playwright screenshots of all pages, and verify they show REAL DATA (not loading skeletons or empty states). +5. **Check browser console** — Run Playwright with console error capture. Any CORS or fetch errors mean the integration is broken. +Do NOT mark this feature as passing based only on `tsc --noEmit`. TypeScript cannot catch CORS or route mismatches. +{END IF} diff --git a/references/templates/refinement-subagent.md b/references/templates/refinement-subagent.md new file mode 100644 index 0000000..39058d5 --- /dev/null +++ b/references/templates/refinement-subagent.md @@ -0,0 +1,116 @@ +# Refinement Subagent Prompt Template + +Fill in `{variables}` and evaluate `{IF}` blocks before passing to the Agent tool. + +--- + +You are refining a recently completed feature. The feature is already implemented, tested, verified, and committed. Your job is to polish and improve it — both the user experience and the code quality. + +## Project Context +- Working directory: {pwd} +- Active scope: {scope} +- Project type: {type} +- Feature just completed: #{id} — {description} +- Screenshots directory: {screenshots_dir} +- Refinement output: {pwd}/specs/{scope}/refinements/feature-{id}-refinement-{YYYYMMDD-HHMMSS}.md (use current timestamp) + +## Standards Documents +Read these before starting: +- {skill_base_dir}/references/core/code-quality.md +{IF type == "web" or type == "mobile":} +- {skill_base_dir}/references/web/ux-standards.md +- {skill_base_dir}/references/web/frontend-design.md +{END IF} + +## What Was Done +Review the most recent commit to understand what was implemented: +git log --oneline -1 +git diff HEAD~1 --name-only + +{IF type == "web" or type == "mobile":} +## Part 1: UX/Visual Refinement + +Think divergently about how to make users LOVE this interface. Don't just check for bugs — imagine better ways to present the information and interactions. + +1. Use the Read tool to review ALL screenshots in {screenshots_dir}/ for this feature +2. For each screen, evaluate from a first-time user's perspective: + - Is the purpose of this screen immediately obvious? + - Can the user figure out what to do without instructions? + - Does the visual hierarchy guide the eye to the most important action? + - Are transitions and state changes smooth and predictable? +3. Think divergently about improvements — consider alternatives you haven't tried: + - Could the layout be reorganized for better flow or scannability? + - Would micro-interactions (hover effects, transitions, focus states) make it feel more responsive and alive? + - Is whitespace being used effectively to create breathing room and group related elements? + - Could typography be more expressive — size contrasts, weight variations, line heights? + - Are colors creating the right emotional tone? Could accent colors highlight key actions better? + - Are empty states, loading states, and error states not just functional but helpful and encouraging? + - Could icons, illustrations, or subtle visual cues improve comprehension? +4. Research: look at how the standards documents suggest handling similar UI patterns. Are there recommendations you missed? +5. Implement the most impactful improvements — prioritize changes that make the biggest difference to user understanding and delight +6. Re-run Playwright tests and re-capture screenshots +7. Visually verify the improvements look better than before +{END IF} + +## Part 2: Code Quality Refinement + +Re-read all generated code with fresh eyes, looking for opportunities to make it more maintainable and testable. + +1. Read ALL files changed in the most recent commit: `git diff HEAD~1 --name-only` +2. For each file, evaluate: + - **Abstraction**: Are there functions doing too many things? Should logic be extracted? + - **Testability**: Is business logic separated from framework/UI code? Could someone write a unit test for the core logic without setting up the whole framework? + - **Readability**: Would a new developer understand this code without extensive context? Are names clear and descriptive? + - **Duplication**: Is there repeated logic that should be a shared utility? + - **Simplicity**: Are there overly complex control flows that could be simplified? Deep nesting that could be flattened? +3. Make concrete improvements — refactor, rename, extract, simplify +4. Run all unit tests — ensure they still pass +5. If you extracted new logic, write unit tests for it + +## Part 3: Write Refinement Report + +Each refinement pass creates a NEW file with a timestamp — never overwrite previous reports. This preserves the history of what was reviewed and changed across multiple refinement passes. + +Write your analysis to `{pwd}/specs/{scope}/refinements/feature-{id}-refinement-{YYYYMMDD-HHMMSS}.md` (replace `{YYYYMMDD-HHMMSS}` with the current date-time, e.g. `feature-2-refinement-20260320-143052.md`) with this structure: + +```markdown +# Feature #{id} Refinement: {description} + +## UX Analysis (web/mobile only) +- **Screenshots reviewed**: [list of screenshots] +- **Issues found**: [what problems or opportunities were identified] +- **Alternatives considered**: [what other approaches were thought about] +- **Changes made**: [what was actually improved and why] +- **Changes deferred**: [ideas noted for future consideration, if any] + +## Code Quality Analysis +- **Files reviewed**: [list of files] +- **Issues found**: [code smells, abstraction opportunities, naming issues] +- **Refactoring done**: [what was changed and why] +- **Test coverage**: [new tests added, if any] + +## Summary +[1-2 sentence summary of the refinement pass] +``` + +## Commit + +Generate the timestamp for the refinement file name using: `date +%Y%m%d-%H%M%S` + +If you made code or UI changes: +git add -A && git commit -m "refine: polish feature #{id} — [summary of improvements]" + +If no code changes were warranted, still commit the refinement report: +git add specs/{scope}/refinements/ && git commit -m "refine: review feature #{id} — no changes needed" + +## Rules +- This is a POLISH pass — do NOT add new functionality +- Do NOT break existing tests +- Keep changes focused on improving what exists +- Think creatively about UX — the goal is to make users enjoy and understand the interface +- Think critically about code — the goal is to make the codebase a pleasure to maintain +- ALWAYS write the refinement report, even if no changes are made +- NEVER use AskUserQuestion or EnterPlanMode — work autonomously +- **Read before Edit**: If Edit fails (old_string not found), Read the file first. Never guess. +- **Compile before test**: After any code change, run `tsc --noEmit` (frontend) or `go build ./...` (backend) BEFORE running tests. Fix compile errors first. +- **Max 2 retries**: If the same approach fails twice, change strategy. Read errors carefully. diff --git a/references/verification/api-verification.md b/references/verification/api-verification.md new file mode 100644 index 0000000..b750f32 --- /dev/null +++ b/references/verification/api-verification.md @@ -0,0 +1,145 @@ +# API Verification Strategy + +Verify API features through integration tests, endpoint validation, and contract testing. + +**This is the verification strategy for project type: `api`** + +## Overview + +API projects are verified through: +1. **Integration tests** — Hit real endpoints with real requests +2. **Response validation** — Check status codes, response shapes, error formats +3. **Contract compliance** — Responses match OpenAPI/schema definitions +4. **Edge case coverage** — Invalid input, auth failures, not-found, rate limits + +## Process + +### Step 1: Ensure Environment is Running + +```bash +# Check API server is running (adjust port for your project) +lsof -i :8080 | head -2 +curl -s http://localhost:8080/health || echo "API not responding" +``` + +If not running, start with `bash init.sh`. + +### Step 2: Write Integration Tests + +Every feature MUST have integration tests covering: + +**Happy path:** +``` +- Valid request → correct status code + response body +- All required fields present in response +- Correct content-type header +``` + +**Error cases:** +``` +- Missing required fields → 400 with descriptive error +- Invalid field values → 400 with field-specific errors +- Unauthorized → 401 with error message +- Forbidden → 403 with error message +- Not found → 404 with error message +- Conflict/duplicate → 409 with error message +``` + +**Edge cases:** +``` +- Empty collections → 200 with empty array (not null) +- Pagination boundaries → correct page/total counts +- Large payloads → handled gracefully +- Concurrent requests → no race conditions +``` + +### Example Test Patterns + +#### Go (net/http/httptest) +```go +func TestCreateProduct(t *testing.T) { + srv := setupTestServer(t) + + resp, err := srv.Client().Post(srv.URL+"/api/products", + "application/json", + strings.NewReader(`{"name": "Test", "price": 9.99}`)) + require.NoError(t, err) + require.Equal(t, http.StatusCreated, resp.StatusCode) + + var product Product + json.NewDecoder(resp.Body).Decode(&product) + assert.Equal(t, "Test", product.Name) + assert.NotEmpty(t, product.ID) +} +``` + +#### Python (pytest + requests) +```python +def test_create_product(api_client): + resp = api_client.post("/api/products", json={"name": "Test", "price": 9.99}) + assert resp.status_code == 201 + data = resp.json() + assert data["name"] == "Test" + assert "id" in data +``` + +#### Node.js (vitest + supertest) +```typescript +test('POST /api/products creates a product', async () => { + const res = await request(app) + .post('/api/products') + .send({ name: 'Test', price: 9.99 }) + .expect(201); + + expect(res.body.name).toBe('Test'); + expect(res.body.id).toBeDefined(); +}); +``` + +### Step 3: Run Tests + +```bash +# Use the project's test command +go test ./... # Go +pytest tests/ # Python +npm test # Node.js +``` + +### Step 4: Verify Test Quality + +After tests pass, verify they are thorough: + +1. **Coverage check** — Are all endpoints tested? +2. **Error paths tested** — Not just happy paths? +3. **Response shape validated** — Not just status codes? +4. **Auth tested** — Protected endpoints reject unauthorized requests? +5. **Idempotency** — Can tests run multiple times without side effects? + +### Step 5: Document Results + +Record in the subagent's output: +- Endpoints tested (method + path) +- Status codes verified +- Error scenarios covered +- Any issues found and fixed + +## Verification Checklist + +For each API feature, verify: + +- [ ] All endpoints return correct status codes +- [ ] Response bodies match expected schema +- [ ] Error responses have consistent format (e.g., `{"error": "message", "details": [...]}`) +- [ ] Authentication/authorization enforced on protected endpoints +- [ ] Input validation rejects malformed data with helpful errors +- [ ] Pagination works correctly (page, limit, total, next/prev) +- [ ] Filters and search return correct subsets +- [ ] CRUD operations are complete (create, read, update, delete all work) +- [ ] Concurrent access doesn't cause data corruption + +## Parent Agent Post-Verification + +After subagent completes, parent MUST: +1. Confirm all tests pass: check test output or run a quick smoke test +2. Verify error handling is tested (not just happy paths) +3. If coverage seems thin, launch a follow-up subagent to add missing test cases diff --git a/references/verification/cli-verification.md b/references/verification/cli-verification.md new file mode 100644 index 0000000..92a9eac --- /dev/null +++ b/references/verification/cli-verification.md @@ -0,0 +1,163 @@ +# CLI Verification Strategy + +Verify CLI tool features through command execution tests, output validation, and exit code checks. + +**This is the verification strategy for project type: `cli`** + +## Overview + +CLI projects are verified through: +1. **Command execution tests** — Run commands with various arguments and flags +2. **Output validation** — Check stdout, stderr, and file output +3. **Exit code checks** — Correct exit codes for success and failure +4. **Edge case coverage** — Missing args, invalid input, permissions, large files + +## Process + +### Step 1: Ensure Tool is Built + +```bash +# Build the CLI tool (adjust for your project) +go build -o ./bin/mytool . # Go +cargo build # Rust +npm run build # Node.js +pip install -e . # Python +``` + +### Step 2: Write Integration Tests + +Every feature MUST have tests covering: + +**Happy path:** +``` +- Valid args → correct output + exit code 0 +- All flags/options work as documented +- Output format is correct (text, JSON, table, etc.) +``` + +**Error cases:** +``` +- Missing required args → helpful error message + exit code 1 +- Invalid flag values → descriptive error + exit code 1 +- File not found → clear error + exit code 1 +- Permission denied → clear error + exit code 1 +- Invalid input format → parse error with line/position info +``` + +**Edge cases:** +``` +- Empty input → graceful handling (not crash) +- Very large input → handles without OOM or hang +- Stdin pipe → works with piped input +- No TTY → works in non-interactive mode +- Ctrl+C → clean shutdown +``` + +### Example Test Patterns + +#### Go (exec.Command) +```go +func TestListCommand(t *testing.T) { + cmd := exec.Command("./bin/mytool", "list", "--format", "json") + out, err := cmd.CombinedOutput() + require.NoError(t, err, "command failed: %s", string(out)) + + var items []Item + require.NoError(t, json.Unmarshal(out, &items)) + assert.NotEmpty(t, items) +} + +func TestInvalidFlag(t *testing.T) { + cmd := exec.Command("./bin/mytool", "--invalid-flag") + out, err := cmd.CombinedOutput() + assert.Error(t, err) + assert.Contains(t, string(out), "unknown flag") +} +``` + +#### Rust (assert_cmd) +```rust +use assert_cmd::Command; + +#[test] +fn test_list_command() { + Command::cargo_bin("mytool") + .unwrap() + .arg("list") + .arg("--format") + .arg("json") + .assert() + .success() + .stdout(predicates::str::contains("[")); +} +``` + +#### Python (subprocess) +```python +import subprocess + +def test_list_command(): + result = subprocess.run( + ["python", "-m", "mytool", "list", "--format", "json"], + capture_output=True, text=True + ) + assert result.returncode == 0 + data = json.loads(result.stdout) + assert isinstance(data, list) +``` + +#### Node.js (execa) +```typescript +import { execa } from 'execa'; + +test('list command outputs JSON', async () => { + const { stdout, exitCode } = await execa('./bin/mytool', ['list', '--format', 'json']); + expect(exitCode).toBe(0); + const data = JSON.parse(stdout); + expect(Array.isArray(data)).toBe(true); +}); +``` + +### Step 3: Run Tests + +```bash +# Use the project's test command +go test ./... +cargo test +npm test +pytest tests/ +``` + +### Step 4: Verify Test Quality + +After tests pass, verify: + +1. **All subcommands tested** — Every command/subcommand has at least one test +2. **All flags tested** — Each flag is exercised in at least one test +3. **Help text correct** — `--help` output matches actual behavior +4. **Error messages helpful** — Errors tell the user what to do, not just what went wrong +5. **Exit codes consistent** — 0 for success, 1 for user error, 2 for system error + +## Verification Checklist + +For each CLI feature, verify: + +- [ ] Command produces correct output for valid input +- [ ] Exit code is 0 on success +- [ ] Exit code is non-zero on failure +- [ ] Error messages go to stderr (not stdout) +- [ ] Error messages are actionable (tell user how to fix) +- [ ] `--help` flag works and is accurate +- [ ] Flags have short and long forms where appropriate +- [ ] Output formats work (text, json, table, csv if supported) +- [ ] Piped input works (`echo "data" | mytool process`) +- [ ] File arguments handle missing/unreadable files gracefully +- [ ] Quiet/verbose modes work if supported + +## Parent Agent Post-Verification + +After subagent completes, parent MUST: +1. Confirm all tests pass +2. Run a quick smoke test: `./bin/mytool --help` or equivalent +3. Verify error cases are tested (not just happy paths) +4. If test coverage seems thin, launch a follow-up subagent diff --git a/references/verification/data-verification.md b/references/verification/data-verification.md new file mode 100644 index 0000000..298bce5 --- /dev/null +++ b/references/verification/data-verification.md @@ -0,0 +1,177 @@ +# Data Pipeline Verification Strategy + +Verify data pipeline features through input/output validation, transformation tests, and data quality checks. + +**This is the verification strategy for project type: `data`** + +## Overview + +Data pipeline projects are verified through: +1. **Transformation tests** — Input data → expected output data +2. **Schema validation** — Output matches expected schema/types +3. **Data quality checks** — No nulls where unexpected, no duplicates, correct aggregations +4. **Edge case coverage** — Empty datasets, malformed records, schema evolution + +## Process + +### Step 1: Ensure Environment is Ready + +```bash +# Check database/data services are running (adjust for your project) +docker-compose ps # Docker services +psql -c "SELECT 1" 2>/dev/null # PostgreSQL +python -c "import pandas; print('OK')" # Python deps +``` + +If not running, start with `bash init.sh`. + +### Step 2: Write Pipeline Tests + +Every feature MUST have tests covering: + +**Happy path:** +``` +- Valid input data → correct transformed output +- Aggregations produce correct totals/counts +- Joins produce correct merged records +- Output schema matches specification +``` + +**Error cases:** +``` +- Malformed input records → skipped or logged (not crash) +- Missing required fields → clear error or default value +- Type mismatches → coercion or descriptive error +- Connection failures → retry or clear error +``` + +**Edge cases:** +``` +- Empty dataset → empty output (not crash or null) +- Single record → works correctly +- Very large dataset → completes within resource limits +- Duplicate records → handled per spec (dedupe, keep-all, etc.) +- Null/missing values → handled consistently +- Schema evolution → backward-compatible +``` + +### Example Test Patterns + +#### Python (pytest + pandas) +```python +def test_transform_sales_data(): + input_df = pd.DataFrame({ + 'date': ['2024-01-01', '2024-01-01', '2024-01-02'], + 'product': ['A', 'B', 'A'], + 'amount': [100, 200, 150] + }) + result = transform_sales(input_df) + + assert len(result) == 2 # Grouped by date + assert result.loc[result['date'] == '2024-01-01', 'total'].values[0] == 300 + assert result.loc[result['date'] == '2024-01-02', 'total'].values[0] == 150 + +def test_transform_handles_empty(): + empty_df = pd.DataFrame(columns=['date', 'product', 'amount']) + result = transform_sales(empty_df) + assert len(result) == 0 + assert list(result.columns) == ['date', 'total'] # Schema preserved + +def test_transform_handles_nulls(): + input_df = pd.DataFrame({ + 'date': ['2024-01-01', None], + 'product': ['A', 'B'], + 'amount': [100, None] + }) + result = transform_sales(input_df) + assert result['total'].isna().sum() == 0 # No nulls in output +``` + +#### SQL (dbt tests) +```yaml +# schema.yml +models: + - name: sales_summary + columns: + - name: date + tests: [not_null, unique] + - name: total + tests: [not_null] + tests: + - dbt_utils.expression_is_true: + expression: "total >= 0" +``` + +#### Spark (PySpark) +```python +def test_aggregate_orders(spark): + input_data = [("2024-01-01", "A", 100), ("2024-01-01", "B", 200)] + input_df = spark.createDataFrame(input_data, ["date", "product", "amount"]) + + result = aggregate_orders(input_df) + + assert result.count() == 1 + row = result.collect()[0] + assert row["total"] == 300 +``` + +### Step 3: Run Tests + +```bash +pytest tests/ -v # Python +dbt test # dbt +spark-submit --master local tests/ # Spark +go test ./pipeline/... # Go +``` + +### Step 4: Verify Data Quality + +After tests pass, verify: + +1. **Schema correct** — Output columns/fields match spec +2. **No data loss** — Row counts match expectations (input vs output) +3. **No duplicates** — Unless explicitly expected +4. **Aggregations correct** — Spot-check totals manually +5. **Null handling consistent** — Documented and tested +6. **Idempotent** — Running pipeline twice produces same result + +### Step 5: Validate with Sample Data + +Run the pipeline against a representative sample: + +```bash +# Run with test fixtures +python -m pipeline --input fixtures/sample_input.csv --output /tmp/output.csv + +# Verify output +python -c " +import pandas as pd +df = pd.read_csv('/tmp/output.csv') +print(f'Rows: {len(df)}') +print(f'Columns: {list(df.columns)}') +print(f'Nulls: {df.isnull().sum().to_dict()}') +print(df.head()) +" +``` + +## Verification Checklist + +For each data feature, verify: + +- [ ] Input → output transformation is correct +- [ ] Output schema matches specification +- [ ] Null/missing values handled consistently +- [ ] Empty input produces empty output (not error) +- [ ] Aggregations are mathematically correct +- [ ] No unintended data loss or duplication +- [ ] Pipeline is idempotent (safe to re-run) +- [ ] Error records are logged/quarantined (not silently dropped) +- [ ] Performance is acceptable for expected data volumes + +## Parent Agent Post-Verification + +After subagent completes, parent MUST: +1. Confirm all tests pass +2. Verify output schema matches spec +3. Check that edge cases (empty, null, duplicate) are tested +4. If data quality checks seem thin, launch a follow-up subagent diff --git a/references/verification/library-verification.md b/references/verification/library-verification.md new file mode 100644 index 0000000..bfe6622 --- /dev/null +++ b/references/verification/library-verification.md @@ -0,0 +1,190 @@ +# Library Verification Strategy + +Verify library features through unit tests, public API validation, and integration examples. + +**This is the verification strategy for project type: `library`** + +## Overview + +Library projects are verified through: +1. **Unit tests** — Thorough testing of all public functions/methods +2. **Public API validation** — Exports, types, and interfaces are correct +3. **Integration examples** — Real usage patterns work end-to-end +4. **Edge case coverage** — Nil/null inputs, boundary values, concurrent access + +## Process + +### Step 1: Ensure Library Builds + +```bash +# Build/compile the library (adjust for your project) +go build ./... # Go +cargo build # Rust +npm run build # Node.js/TypeScript +python -m py_compile src/*.py # Python +``` + +### Step 2: Write Unit Tests + +Every public function/method MUST have tests covering: + +**Happy path:** +``` +- Valid inputs → correct outputs +- All overloads/variants work +- Return types are correct +``` + +**Error cases:** +``` +- Invalid inputs → clear error (not panic/crash) +- Nil/null/undefined → handled gracefully +- Out-of-range values → descriptive error +- Type mismatches → compile-time or clear runtime error +``` + +**Edge cases:** +``` +- Empty collections → correct behavior (not crash) +- Boundary values → correct at min/max +- Concurrent access → thread-safe if documented as such +- Large inputs → handles without excessive memory/time +``` + +### Example Test Patterns + +#### Go +```go +func TestParse(t *testing.T) { + tests := []struct { + name string + input string + want *Result + wantErr bool + }{ + {"valid input", "hello", &Result{Value: "hello"}, false}, + {"empty input", "", nil, true}, + {"special chars", "a&b { + it('parses valid input', () => { + expect(parse('hello')).toEqual({ value: 'hello' }); + }); + + it('throws on empty input', () => { + expect(() => parse('')).toThrow('Input cannot be empty'); + }); + + it('handles special characters', () => { + expect(parse('a&b { + it('should login successfully', async () => { + await device.takeScreenshot('login-initial'); + + await element(by.id('email-input')).typeText('test@example.com'); + await element(by.id('password-input')).typeText('password123'); + await element(by.id('login-button')).tap(); + + await expect(element(by.id('dashboard'))).toBeVisible(); + await device.takeScreenshot('dashboard-after-login'); + }); +}); +``` + +#### Flutter +```dart +testWidgets('login flow', (tester) async { + await tester.pumpWidget(MyApp()); + + // Screenshot: initial + await expectLater(find.byType(MyApp), matchesGoldenFile('login-initial.png')); + + await tester.enterText(find.byKey(Key('email')), 'test@example.com'); + await tester.enterText(find.byKey(Key('password')), 'password123'); + await tester.tap(find.byKey(Key('login-button'))); + await tester.pumpAndSettle(); + + // Screenshot: after login + await expectLater(find.byType(MyApp), matchesGoldenFile('dashboard.png')); +}); +``` + +### Step 3: Run Tests + +```bash +# Detox +npx detox test --configuration ios.sim.debug + +# Flutter +flutter test integration_test/ + +# XCTest +xcodebuild test -scheme MyApp -destination 'platform=iOS Simulator,name=iPhone 15' +``` + +### Step 4: Visual Review (MANDATORY) + +Use the Read tool to inspect EVERY screenshot. Evaluate: + +#### Layout +- Content fits screen without horizontal scrolling +- No elements clipped by safe area (notch, home indicator) +- Proper alignment and spacing + +#### Touch Targets +- All tappable elements at least 44x44 points +- Adequate spacing between touch targets + +#### Platform Conventions +- iOS: follows HIG (navigation bars, tab bars, system colors) +- Android: follows Material Design (app bars, FAB, bottom nav) +- Platform-appropriate gestures and transitions + +#### States +- Loading indicators present during async operations +- Empty states with helpful messaging +- Error states with recovery options +- Pull-to-refresh where appropriate + +#### Aesthetics +- Polished and platform-native feel +- Typography matches platform conventions +- Colors and theming consistent +- Smooth transitions between screens + +#### Device Sizes +- Works on small screens (iPhone SE / small Android) +- Works on large screens (iPhone Pro Max / tablet) +- Landscape orientation handled (if applicable) + +### Step 5: Fix Issues + +If screenshots reveal problems: +1. Fix layout/styling in the relevant component +2. Re-run tests to capture updated screenshots +3. Review again until all issues resolved + +## Verification Checklist + +For each mobile feature, verify: + +- [ ] E2E test passes on target platform(s) +- [ ] Screenshots captured at key states +- [ ] Touch targets are minimum 44x44 points +- [ ] Safe area respected (notch, home indicator) +- [ ] Loading states present for async operations +- [ ] Error states present with recovery options +- [ ] Works on small and large screen sizes +- [ ] Platform conventions followed (HIG/Material) +- [ ] Accessibility labels present on interactive elements + +## Parent Agent Post-Verification + +After subagent completes, parent MUST: +1. Confirm screenshots exist for this feature +2. Spot-check one screenshot with the Read tool +3. If quality is poor, launch a polish subagent +4. Verify platform conventions are followed diff --git a/references/verification/web-verification.md b/references/verification/web-verification.md new file mode 100644 index 0000000..115928e --- /dev/null +++ b/references/verification/web-verification.md @@ -0,0 +1,283 @@ +# Web Verification Strategy + +Verify web features using Playwright E2E tests with screenshot capture and visual review. + +**This is the verification strategy for project type: `web`** + +## Overview + +Web projects are verified through: +1. **Interaction tests** — Playwright tests that **perform user actions** (click, fill, submit, navigate) and verify outcomes — this is the PRIMARY verification that features actually work +2. **Screenshots** — Captured at key states for visual review +3. **Visual review** — AI agent reviews every screenshot against quality criteria +4. **UX standards compliance** — Loading/empty/error states, responsive, accessible + +**CRITICAL DISTINCTION:** Screenshots verify APPEARANCE. Interaction tests verify BEHAVIOR. Both are required, but interaction tests are MORE important — a feature that looks perfect but doesn't work is worse than a feature that looks rough but works correctly. + +## Prerequisites + +```bash +npm install -D @playwright/test +npx playwright install +``` + +## Process + +### Step 1: Ensure Environment is Running + +```bash +# Check frontend and backend ports (adjust for your project) +lsof -i :3000 | head -2 # Frontend +lsof -i :8082 | head -2 # Backend +``` + +If not running, start them with `bash init.sh`. + +### Step 2: Write Tests That Prove the Outcome Works (PRIMARY VERIFICATION) + +Every feature MUST have Playwright tests that **perform the actions a user would perform** and **verify the results the user would expect**. This is the primary verification — it proves the feature actually works, not just that it renders. + +**The universal principle:** Ask "what can the user DO when this feature is done?" Then write a test that does exactly that and checks the result. + +Tests MUST: +- **Perform real user actions** — click buttons, fill forms, navigate links, select options +- **Verify observable outcomes** — text appears, page navigates, data changes, notifications show +- **Cover the complete flow** — not just "page loads" but "user completes the task from start to finish" + +Tests MUST NOT: +- Only navigate to a page and take a screenshot (proves rendering, not behavior) +- Only check that a component exists without interacting with it +- Skip verifying the result of an action (e.g., submit a form but don't check if data was saved) + +```typescript +// CORRECT: Test proves the user can complete the task +test('user can edit a product', async ({ page, request }) => { + // Setup: seed data via API + const res = await request.post('/api/v1/products', { data: { name: 'Original', sku: 'TEST-001', price: 10 } }); + const product = (await res.json()).data; + + // Act: perform the user journey + await page.goto('/products'); + await page.getByRole('button', { name: `Actions for Original` }).click(); + await page.getByRole('menuitem', { name: 'Edit' }).click(); + await expect(page.getByLabel('Name')).toHaveValue('Original'); + await page.getByLabel('Name').fill('Updated Name'); + await page.getByRole('button', { name: 'Save' }).click(); + + // Assert: verify the outcome + await expect(page).toHaveURL('/products'); + await expect(page.getByText('Updated Name')).toBeVisible(); +}); + +// WRONG: Only proves the page renders, not that any feature works +test('products page', async ({ page }) => { + await page.goto('/products'); + await page.screenshot({ path: 'screenshot.png', fullPage: true }); + // Edit could be broken, Delete could crash, Filters could be no-ops +}); +``` + +This principle applies beyond CRUD. For any feature: +- **Search/filter:** Type a query → verify results change → clear → verify results reset +- **Navigation:** Click a link → verify destination page loads with correct content +- **Settings:** Change a setting → verify the change takes effect → reload → verify it persisted +- **Workflow:** Start a process → advance through steps → verify completion state +- **Upload:** Select file → upload → verify file appears in list +- **Auth:** Login → verify access to protected page → logout → verify redirect to login + +### Step 3: Capture Screenshots at Key States + +In addition to interaction tests, capture screenshots for visual review. + +**Screenshot directory:** Screenshots are stored per-scope at `specs/{scope}/screenshots/` relative to the project root. The parent agent resolves this to an absolute path (`{pwd}/specs/{scope}/screenshots/`) and passes it as `{screenshots_dir}` in the subagent prompt. + +```typescript +import { test, expect } from '@playwright/test'; + +test('user can login', async ({ page }) => { + await page.goto('/login'); + + // Screenshot: Initial state + // Path is relative to the Playwright project root (where playwright.config.ts lives) + await page.screenshot({ + path: `${screenshots_dir}/feature-${id}-step1-login-initial.png`, + fullPage: true + }); + + await page.getByLabel('Email').fill('test@example.com'); + await page.getByLabel('Password').fill('password123'); + await page.getByRole('button', { name: 'Login' }).click(); + + await expect(page).toHaveURL('/dashboard'); + + // Screenshot: After action + await page.screenshot({ + path: `${screenshots_dir}/feature-${id}-step2-dashboard-after-login.png`, + fullPage: true + }); +}); +``` + +### Step 3: Run Tests + +```bash +npx playwright test +``` + +### Step 4: Visual Review (MANDATORY) + +Use the Read tool to open and visually inspect EVERY screenshot. Evaluate: + +#### Layout +- Content fits without overflow or clipping +- Proper alignment (grid, flex) + +#### Spacing +- Consistent spacing patterns (4/8/16/24/32px scale) +- Not too cramped or sparse + +#### Visual Hierarchy +- Most important action is obvious +- Page title > section title > body text size hierarchy + +#### States +- Loading state present (skeleton or spinner) +- Empty state present (icon + message + CTA) +- Error state present and styled + +#### Aesthetics +- Polished and intentional, not generic/prototype-level +- Typography is distinctive and hierarchical +- Color palette is cohesive +- Visual depth: appropriate shadows, borders + +#### Consistency +- Similar screens use same patterns +- Colors consistent with theme + +### Step 5: Fix Issues + +If screenshots reveal problems: +1. Locate the relevant component file +2. Make targeted CSS/layout changes +3. Re-run tests to capture updated screenshots +4. Review again until all issues resolved + +**Priority order:** +1. Broken layout (overflow, clipping, misalignment) +2. Missing states (loading, empty, error) +3. Accessibility issues (contrast, focus rings, labels) +4. Visual polish (shadows, transitions, typography) +5. Consistency issues (spacing, colors) + +## Screenshot Naming Convention + +Format: `feature-{id}-step{N}-{description}.png` (scope is encoded in the directory path `specs/{scope}/screenshots/`) + +Examples: +- `feature-17-step3-modal-open.png` +- `feature-7-step6-project-in-list.png` + +## Playwright Configuration + +```typescript +export default defineConfig({ + timeout: 10000, + expect: { timeout: 3000 }, + reporter: [ + ['list'], + ['json', { outputFile: 'e2e/test-results/results.json' }], + ], + use: { + actionTimeout: 5000, + navigationTimeout: 10000, + screenshot: 'on', + trace: 'retain-on-failure', + }, +}); +``` + +## Full-Stack Integration Smoke Test (NON-NEGOTIABLE for web projects with backend) + +When a feature connects the frontend to a real backend API (e.g., replacing mock data with real API calls), a **live integration smoke test** MUST be performed. This catches issues that TypeScript compilation alone cannot detect — CORS, route prefix mismatches, response envelope mismatches, and authentication failures. + +### When to Run + +Run this smoke test for ANY feature that: +- Replaces mock/stub data with real API calls +- Changes the API base URL, fetch wrapper, or custom client +- Modifies backend route registration or middleware +- Is the first feature to connect a previously-mocked frontend to the real backend + +### Process + +**Step 1: Start both servers** + +```bash +# Start backend (with real database) +cd backend && DATABASE_URL="..." go run ./cmd/api/ & +sleep 3 + +# Start frontend (pointing to backend) +cd frontend && VITE_API_BASE_URL=http://localhost:8080 pnpm dev & +sleep 3 +``` + +**Step 2: Verify backend responds to API calls** + +```bash +# Test a list endpoint directly (bypasses CORS — this tests the backend alone) +curl -s http://localhost:8080/api/v1/ | head -5 +``` + +If this returns 404, the route prefix is wrong (common with code generators like ogen that don't include the OpenAPI `servers.url` prefix in generated routes). Fix by mounting the generated server under the correct prefix (e.g., `http.StripPrefix("/api/v1", server)`). + +**Step 3: Verify CORS headers** + +```bash +curl -s -I -X OPTIONS http://localhost:8080/api/v1/ \ + -H 'Origin: http://localhost:5173' | grep -i 'access-control' +``` + +If no `Access-Control-Allow-Origin` header is present, the browser will block all frontend requests. Add CORS middleware to the backend. This is the **#1 most common cause** of "frontend shows loading forever" bugs in full-stack web projects. + +**Step 4: Seed test data and take screenshots** + +```bash +# Seed at least 2-3 records via API +curl -X POST http://localhost:8080/api/v1/ -H 'Content-Type: application/json' -d '...' +``` + +Then run Playwright screenshot tests against all major pages and **visually verify** that: +- Pages show **real data** (not loading skeletons or empty states) +- Data matches what was seeded (correct names, counts, values) +- No console errors in the browser (especially CORS or fetch failures) + +**Step 5: Fail-fast criteria** + +The integration smoke test FAILS if any of these are true: +- Backend returns 404 for known API endpoints → route prefix mismatch +- CORS headers are missing → add CORS middleware +- Screenshots show loading skeletons that never resolve → API calls failing silently +- Screenshots show empty states despite seeded data → response envelope mismatch +- Browser console shows fetch/network errors → connectivity or CORS issue + +### Common Root Causes + +| Symptom | Root Cause | Fix | +|---------|-----------|-----| +| Backend returns 404 for /api/v1/... | Code generator (ogen, openapi-generator) registers routes without server URL prefix | Mount generated handler under `/api/v1` with `http.StripPrefix` or equivalent | +| Frontend shows loading forever | CORS: browser blocks cross-origin requests | Add CORS middleware (`Access-Control-Allow-Origin: *` for dev) | +| Frontend shows empty despite seeded data | Response envelope mismatch: frontend expects `{ data: ... }` but backend returns flat response, or vice versa | Align envelope handling in fetch wrapper or backend | +| API works via curl but not from browser | CORS (curl bypasses CORS, browsers enforce it) | Add CORS middleware | +| OPTIONS requests return 404 | Backend doesn't handle preflight requests | CORS middleware must handle OPTIONS with 204 No Content | + +## Parent Agent Post-Verification + +After subagent completes, parent MUST: +1. **Confirm interaction tests exist and pass** — for CRUD features, check that the subagent wrote tests that exercise user flows (create, edit, delete), not just screenshot-only tests. If tests only take screenshots without clicking/submitting, the feature is NOT verified. +2. Confirm screenshots exist: `ls {screenshots_dir}/feature-{id}-*.png 2>/dev/null | wc -l` + (`{screenshots_dir}` = `{pwd}/specs/{scope}/screenshots/`) +3. Spot-check one screenshot with the Read tool — verify it shows **real data and completed states** (e.g., edit form with pre-filled data, not just an empty form) +4. If quality is poor, launch a polish subagent +5. **For full-stack features**: verify screenshots show **real data**, not loading skeletons or empty states. If data is missing, run the integration smoke test above to diagnose. diff --git a/references/web/e2e-verification.md b/references/web/e2e-verification.md new file mode 100644 index 0000000..491ab0b --- /dev/null +++ b/references/web/e2e-verification.md @@ -0,0 +1,289 @@ +# E2E Screenshot Verification — Full Details + +> **Subagent reference:** This file is inlined in subagent prompts for quick reference on screenshot mechanics. For the FULL verification process (including interaction tests), see `references/verification/web-verification.md`. + +Verify features work correctly using Playwright E2E tests with screenshot capture and visual review. + +**Interaction tests that prove user outcomes are the PRIMARY verification.** Tests must perform real user actions (click, fill, submit, navigate) and verify observable results (data appears, page navigates, state changes). Screenshots are SECONDARY — they verify visual quality and catch layout/styling issues that interaction tests cannot detect. Both are required, but a feature that passes interaction tests with rough visuals is closer to done than one with perfect screenshots but broken behavior. + +## Screenshot Directory + +Screenshots are stored per-scope at `specs/{scope}/screenshots/` relative to the project root. The parent agent resolves this to an absolute path (`{pwd}/specs/{scope}/screenshots/`) and passes it as `{screenshots_dir}` in the subagent prompt. + +In Playwright test code, always use the **absolute** `{screenshots_dir}` path provided by the parent agent in `page.screenshot()` calls. + +## Prerequisites + +Ensure Playwright is set up: + +```bash +npm install -D @playwright/test +npx playwright install +``` + +## Step-by-Step Process + +### Step 1: Ensure Environment is Running + +```bash +lsof -i :3000 | head -2 # Frontend +lsof -i :8082 | head -2 # Backend +``` + +If not running, start them with `bash init.sh`. + +### Step 2: Ensure Screenshot Directory Exists + +```bash +# Screenshots are committed to the repo as results — never delete them +# New screenshots will overwrite same-named files; old ones are preserved as history +mkdir -p specs/{scope}/screenshots +rm -rf test-results/**/*.png 2>/dev/null || true +``` + +### Step 3: Run E2E Tests + +```bash +# Run all tests +npx playwright test + +# Or run specific test file +npx playwright test e2e/auth.spec.ts + +# Or run tests matching a pattern +npx playwright test --grep "login" +``` + +### Step 4: Check Test Results + +If tests fail, check error context: + +```bash +# Find error context files +find test-results -name "error-context.md" 2>/dev/null | head -5 + +# Find failure screenshots +find test-results -name "*.png" -type f | sort +``` + +Common failure causes: +- Backend not running +- Database not seeded +- Port conflicts +- Stale selectors + +### Step 5: List All Screenshots + +```bash +find specs/{scope}/screenshots -name "*.png" -type f 2>/dev/null | sort +find test-results -name "*.png" -type f 2>/dev/null | sort +``` + +### Step 6: Review Each Screenshot (MANDATORY) + +**CRITICAL**: Use the Read tool to open and visually inspect EVERY screenshot. For each screenshot, explicitly evaluate: + +#### Layout +- ✓/✗ Content fits without overflow? +- ✓/✗ No clipping or cut-off elements? +- ✓/✗ Proper alignment (grid, flex)? + +#### Spacing +- ✓/✗ Appropriate padding/margins? +- ✓/✗ Not too cramped or sparse? +- ✓/✗ Consistent spacing patterns (follows 4/8/16/24/32px scale)? + +#### Touch Targets +- ✓/✗ Buttons/inputs at least 44px? +- ✓/✗ Clickable areas visually obvious? + +#### Visual Hierarchy +- ✓/✗ Most important action is obvious? +- ✓/✗ Disabled states clearly distinguishable? +- ✓/✗ Focus states visible? +- ✓/✗ Page title > section title > body text size hierarchy? + +#### States +- ✓/✗ Loading state present (skeleton or spinner)? +- ✓/✗ Empty state present (icon + message + CTA)? +- ✓/✗ Error state present and styled (red text, red borders)? + +#### Aesthetics (follow /frontend-design principles) +- ✓/✗ Looks polished and intentional, not generic/prototype-level? +- ✓/✗ Typography is distinctive and hierarchical? +- ✓/✗ Color palette is cohesive? +- ✓/✗ Visual depth: appropriate shadows, borders, or textures? +- ✓/✗ Micro-interactions: hover/focus transitions visible? + +#### Data Display +- ✓/✗ Shows real data, not placeholders? +- ✓/✗ Numbers right-aligned in tables? +- ✓/✗ Status badges have colored backgrounds with text? + +#### Consistency +- ✓/✗ Similar screens use same patterns? +- ✓/✗ Colors consistent with theme? +- ✓/✗ Icons consistent in style and size? +- ✓/✗ Spacing consistent with other pages? + +### Step 7: Fix Issues Found + +If screenshots reveal problems: + +1. Locate the relevant component file +2. Make targeted CSS/layout changes +3. Prefer Tailwind utilities over custom CSS +4. Keep all `data-testid` attributes intact +5. Re-run tests to capture updated screenshots +6. Review again until all issues resolved +7. Focus on the biggest visual impact first + +**Priority order for fixes:** +1. Broken layout (overflow, clipping, misalignment) +2. Missing states (loading, empty, error) +3. Accessibility issues (contrast, focus rings, labels) +4. Visual polish (shadows, transitions, typography) +5. Consistency issues (spacing, colors) + +### Step 8: Document Verification + +After successful verification, note: +- Which features were verified +- Any UX improvements made +- Screenshots reviewed (count) +- Visual quality assessment + +## Writing Good E2E Tests + +### Key Principles + +1. **Use data-testid** for stable selectors +2. **EVERY test MUST capture at least one screenshot** — no exceptions +3. **Wait for conditions**, not timeouts +4. **Test at multiple viewports** for responsive features +5. **Mock external APIs** when needed + +### Example Test with Screenshots (REQUIRED PATTERN) + +```typescript +import { test, expect } from '@playwright/test'; + +test('user can login', async ({ page }) => { + await page.goto('/login'); + + // Screenshot: Login page initial state + await page.screenshot({ + path: `${screenshots_dir}/feature-1-step1-login-initial.png`, + fullPage: true + }); + + await page.getByLabel('Email').fill('test@example.com'); + await page.getByLabel('Password').fill('password123'); + await page.getByRole('button', { name: 'Login' }).click(); + + await expect(page).toHaveURL('/dashboard'); + + // Screenshot: Dashboard after login + await page.screenshot({ + path: `${screenshots_dir}/feature-1-step2-dashboard-after-login.png`, + fullPage: true + }); +}); +``` + +### Screenshot Rules (MANDATORY) + +- **Every test MUST have at least one `page.screenshot()` call** +- Name screenshots descriptively (scope is encoded in the directory path) +- Use `fullPage: true` to capture complete page state +- Capture at key user journey points (before action, after action, error state) +- Include error states and empty states in screenshots +- Capture responsive breakpoints if the feature involves responsive behavior: + ```typescript + // Desktop screenshot + await page.setViewportSize({ width: 1280, height: 720 }); + await page.screenshot({ + path: `${screenshots_dir}/feature-1-step1-desktop.png`, + fullPage: true + }); + + // Mobile screenshot + await page.setViewportSize({ width: 375, height: 812 }); + await page.screenshot({ + path: `${screenshots_dir}/feature-1-step1-mobile.png`, + fullPage: true + }); + ``` + +### Screenshot Naming Convention + +Format: `feature-{id}-step{N}-{description}.png` (scope is encoded in the directory path `specs/{scope}/screenshots/`) + +Examples: +- `feature-17-step3-modal-open.png` +- `feature-7-step6-project-in-list.png` +- `feature-15-complete-flow.png` +- `feature-4-step2-validation-errors.png` + +## Playwright Configuration + +Optimize for AI agent consumption: + +```typescript +export default defineConfig({ + // Short timeouts - fail fast + timeout: 10000, // 10s max per test + expect: { + timeout: 3000, // 3s max for assertions + }, + + // AI-readable output format + reporter: [ + ['list'], // Simple pass/fail list + ['json', { outputFile: 'e2e/test-results/results.json' }], + ], + + use: { + actionTimeout: 5000, // 5s max for clicks/fills + navigationTimeout: 10000, + screenshot: 'on', // Keep ALL screenshots + trace: 'retain-on-failure', + }, +}); +``` + +**Why keep ALL screenshots:** +- AI agents need to review UI for UX issues, not just failures +- Success screenshots enable visual regression detection +- Human reviewers can audit AI's work quality +- Screenshots supplement interaction tests by catching visual regressions + +**Why short timeouts:** +- Long waits waste tokens and time +- Missing elements should fail immediately +- Fast feedback enables rapid iteration +- AI can read JSON results directly + +## Troubleshooting + +### Tests Timeout +- Increase timeout in playwright.config.ts +- Check if backend is responding +- Look for infinite loading states + +### Flaky Tests +- Use `await expect()` instead of raw assertions +- Wait for network idle: `await page.waitForLoadState('networkidle')` +- Add retries in CI + +### Screenshots Blank or Wrong +- Ensure page fully loaded before screenshot +- Check viewport size +- Verify correct URL navigation +- Add `await page.waitForLoadState('networkidle')` before screenshot + +### UI Looks Generic in Screenshots +- Review references/web/frontend-design.md and references/web/ux-standards.md +- Check for: distinctive typography, cohesive colors, proper shadows/depth +- Verify loading/empty/error states are polished, not bare text +- Add micro-interactions: hover transitions, focus effects diff --git a/references/web/frontend-design.md b/references/web/frontend-design.md new file mode 100644 index 0000000..c882ca2 --- /dev/null +++ b/references/web/frontend-design.md @@ -0,0 +1,76 @@ +# Frontend Design Principles + +> Adapted from the `/frontend-design` skill. These principles guide all UI implementation across iterative-dev projects of all types. + +## Core Philosophy + +Create distinctive, production-grade frontend interfaces that avoid generic "AI slop" aesthetics. Every UI decision should be intentional, not default. + +## Design Thinking (Before Coding) + +Before implementing any UI, pause and consider: +- **Purpose**: What problem does this interface solve? Who uses it? +- **Tone**: Pick a direction: brutally minimal, luxury/refined, playful, editorial/magazine, industrial/utilitarian, soft/pastel, etc. Admin dashboards often benefit from refined minimalism or industrial clarity. +- **Constraints**: Technical requirements (framework, performance, accessibility) +- **Differentiation**: What makes this memorable? What's the signature detail? + +**CRITICAL**: Choose a clear conceptual direction and execute it with precision. Bold maximalism and refined minimalism both work — the key is intentionality, not intensity. + +## Aesthetic Guidelines + +### Typography +- Choose fonts that are beautiful, unique, and interesting +- **NEVER** use: Inter, Roboto, Arial, system fonts, or other generic defaults +- Opt for distinctive choices that elevate the interface +- Pair a display font with a refined body font +- Use font size contrast to create hierarchy (title vs body vs caption) + +### Color & Theme +- Commit to a cohesive aesthetic — don't scatter colors randomly +- Use CSS variables for consistency across the app +- Dominant colors with sharp accents outperform timid, evenly-distributed palettes +- Dark themes and light themes both work — choose intentionally +- Avoid cliched AI color schemes (particularly purple gradients on white) + +### Motion & Micro-interactions +- Use animations for high-impact moments: page load reveals, state transitions +- Prioritize CSS-only solutions for performance +- Focus on: hover states that surprise, smooth transitions between states, staggered reveals +- One well-orchestrated animation creates more delight than scattered micro-interactions +- Button feedback, loading spinners, and toast slide-ins should all feel intentional + +### Spatial Composition +- Unexpected layouts > cookie-cutter grids +- Asymmetry, overlap, diagonal flow, grid-breaking elements can all work +- Generous negative space OR controlled density — pick one and commit +- For admin/dashboard UIs: clean grid with clear visual hierarchy usually works best + +### Backgrounds & Visual Details +- Create atmosphere and depth rather than solid white/gray backgrounds +- Consider: subtle gradients, noise textures, geometric patterns, layered transparencies +- Dramatic shadows, decorative borders, grain overlays — use sparingly but intentionally +- For admin UIs: subtle texture or gradient in sidebar, clean content area + +## Anti-Patterns (NEVER Do These) + +- Generic font stacks (Inter, Roboto, Arial, system-ui) +- Purple gradients on white backgrounds +- Predictable layouts with no visual interest +- Cookie-cutter component patterns with default styles +- Bare, unstyled HTML elements +- Empty pages with just text "No items found" +- Forms with no visual grouping or hierarchy +- Tables with no hover states or alignment +- Buttons with no feedback on interaction +- Dialogs with no backdrop or transitions + +## Practical Application for Admin/Dashboard UIs + +Admin interfaces need to be **functional AND beautiful**. This means: + +1. **Clean but not bland** — Use subtle visual interest: card shadows, section dividers, icon accents +2. **Data-dense but readable** — Use typography hierarchy, proper spacing, zebra striping +3. **Efficient but not ugly** — Forms should be organized with sections, not a wall of inputs +4. **Professional but not generic** — Choose a color palette, font pairing, and spacing system that has character + +Remember: Claude is capable of extraordinary creative work. Don't hold back — show what can truly be created when committing fully to a distinctive vision, even for "boring" admin UIs. diff --git a/references/web/ux-standards.md b/references/web/ux-standards.md new file mode 100644 index 0000000..52f1b8a --- /dev/null +++ b/references/web/ux-standards.md @@ -0,0 +1,126 @@ +# UX Standards for Production-Quality Apps + +Every feature implemented by a subagent must meet these standards. A feature that works but looks like a prototype is NOT complete. + +## Non-Negotiable Standards (every page must have these) + +### Loading States +- Use skeleton screens for initial page load (preferred over spinners) +- Show inline spinner for actions (save, delete, bulk operations) +- Button text changes during action: "Save" → "Saving..." with disabled state +- Never show a blank page while data loads + +### Empty States +- Icon + heading + description + CTA button +- Example: `[inbox icon] "No products yet" / "Create your first product to get started" / [Add Product button]` +- Empty search results: "No results for 'X'" with a "Clear filters" link +- Never show just an empty table or blank area + +### Error States +- Inline errors below form fields (red text + red border on field) +- Toast notifications for action errors (red/destructive variant) +- Full-page error boundary for crashes with retry option +- Never show raw error messages or stack traces to users + +### Responsive Design +- **375px (Mobile)**: Single column, hamburger nav, stacked cards, horizontal scroll for tables +- **768px (Tablet)**: 2 columns, condensed sidebar or top nav +- **1280px (Desktop)**: Full layout with permanent sidebar +- Tables must either scroll horizontally on mobile OR collapse to card layout +- Touch targets minimum 44px on mobile +- Test at all three breakpoints + +### Accessibility +- All interactive elements: aria-label or associated visible label +- Focus ring visible on keyboard navigation (focus-visible) +- Color is never the only indicator (always add text or icon) +- Minimum contrast ratio: 4.5:1 for text +- Modals must trap focus and close on Escape +- Form inputs must have associated labels + +## Visual Design Standards + +### Typography Hierarchy +- Page title: large and bold (e.g., 24px+ bold) +- Section title: medium and semi-bold (e.g., 18px semi-bold) +- Body text: standard size (e.g., 14-16px) +- Caption/label: small and muted (e.g., 12px, secondary color) +- Choose distinctive fonts — avoid generic defaults like Inter, Arial, system-ui +- Pair a display font with a complementary body font + +### Color & Theme +- Commit to a cohesive palette — don't use random colors +- Define CSS variables for consistency +- Dominant color with sharp accent outperforms evenly-distributed palettes +- Status colors: green=success/active, yellow/amber=warning/draft, red=error/destructive, gray=neutral/archived +- Status badges must have both colored background AND text (not color alone) + +### Spacing Scale +Use a consistent scale throughout the app: +- `4px` — tight inline spacing +- `8px` — compact elements +- `12px` — standard inline padding +- `16px` — standard section padding +- `24px` — generous section spacing +- `32px` — major section breaks +- `48px` — page-level spacing + +### Shadows & Depth +- Cards: subtle shadow at rest, slightly deeper on hover +- Modals/dialogs: prominent shadow for depth +- Dropdowns: medium shadow +- Always add smooth transitions for shadow changes (~200ms) + +### Transitions & Micro-interactions +- Hover effects: smooth color/background transitions (~150-200ms) +- Button press feedback: slight scale or color change +- Page elements: subtle fade-in on mount +- Sidebar/menu open: slide transition with backdrop +- Toast notifications: slide-in from edge +- Never change state abruptly — always transition + +## Feature-Specific Standards + +### Forms +- Group related fields with section headers and visual dividers +- Required fields marked with asterisk (*) +- Help text below non-obvious fields (smaller, muted color) +- Auto-generation feedback (e.g., slug auto-generates as user types name) +- Submit button shows loading state, disables during submit +- Cancel navigates back without side effects +- Unsaved changes: consider confirm-before-leave + +### Tables +- Column headers: bold, uppercase or semi-bold, with sort indicators +- Zebra striping: alternating row backgrounds (subtle, muted tone) +- Hover highlighting: subtle background change on row hover with smooth transition +- Text alignment: text left, numbers right, status centered +- Actions column: icon buttons with tooltips +- Pagination: show current page, total pages, and per-page count + +### Cards / Grid Views +- Consistent card sizing within a grid +- Rounded corners (medium to large radius) +- Border or shadow for visual separation +- Hover effect for clickable cards +- Image aspect ratio maintained + +### Navigation +- Active link clearly distinguished (background color, font weight, or indicator) +- Breadcrumbs on nested pages (e.g., Products > Edit MacBook Pro) +- Mobile: hamburger menu with slide-in overlay + backdrop +- Keyboard accessible: Tab through links, Enter to activate + +### Dialogs / Modals +- Backdrop overlay (semi-transparent black) +- Centered with max-width appropriate to content +- Close button (X) in top-right corner +- Close on Escape key and backdrop click +- Focus trapped inside dialog +- Destructive actions: red/destructive button variant + +### Toast Notifications +- Success: green variant, auto-dismiss after 4s +- Error: red/destructive variant, longer display or manual dismiss +- Position: bottom-right or top-right, consistent throughout app +- Include relevant context (e.g., "Product 'MacBook Pro' deleted") diff --git a/tests/smoke-test.md b/tests/smoke-test.md new file mode 100644 index 0000000..c7840e2 --- /dev/null +++ b/tests/smoke-test.md @@ -0,0 +1,136 @@ +# Skill Smoke Test + +## Purpose +Verify the iterative-dev skill executes all mandatory steps: implement → verify → refine → next. + +## Setup +Create a minimal test project with 2 trivial features. Run the skill. Check artifacts. + +### 1. Create test project +```bash +mkdir -p /tmp/iterative-dev-test && cd /tmp/iterative-dev-test +git init +``` + +### 2. Create minimal scope +```bash +mkdir -p specs/test/{screenshots,refinements} +echo "test" > .active-scope +``` + +Create `specs/test/spec.md`: +``` +# Test Project +A simple CLI tool that greets users. +``` + +Create `specs/test/feature_list.json`: +```json +{ + "type": "cli", + "features": [ + { + "id": 1, + "category": "infrastructure", + "priority": "high", + "description": "Project scaffolding: create a Node.js project with a greet.js script", + "steps": [ + "Create package.json with name 'greeter'", + "Create greet.js that prints 'Hello, World!'", + "Verify: node greet.js outputs 'Hello, World!'" + ], + "passes": false + }, + { + "id": 2, + "category": "functional", + "priority": "high", + "description": "User can greet by name: node greet.js Alice prints 'Hello, Alice!'", + "steps": [ + "Modify greet.js to accept a name argument", + "Default to 'World' if no name provided", + "Write test: node greet.js Alice outputs 'Hello, Alice!'", + "Write test: node greet.js (no args) outputs 'Hello, World!'" + ], + "passes": false + } + ] +} +``` + +Create symlinks and init: +```bash +ln -sf specs/test/spec.md spec.md +ln -sf specs/test/feature_list.json feature_list.json +echo "# Progress" > specs/test/progress.txt +ln -sf specs/test/progress.txt progress.txt +echo '#!/bin/bash' > init.sh && chmod +x init.sh +git add -A && git commit -m "init: smoke test scope" +``` + +### 3. Run the skill +``` +/iterative-dev continue +``` + +### 4. Verify (automated checks) + +```bash +#!/bin/bash +# Run this after the skill completes + +PASS=0 +FAIL=0 + +check() { + if eval "$2"; then + echo "PASS: $1" + ((PASS++)) + else + echo "FAIL: $1" + ((FAIL++)) + fi +} + +# All features pass +check "All features pass" \ + '[ $(cat feature_list.json | grep -c "\"passes\": true") -eq 2 ]' + +# Implementation commits exist +check "Feature commits exist" \ + '[ $(git log --oneline | grep -c "feat:") -ge 2 ]' + +# Refinement commits exist (THE KEY TEST) +check "Refinement commits exist" \ + '[ $(git log --oneline | grep -c "refine:") -ge 2 ]' + +# Refinement reports exist +check "Refinement reports exist" \ + '[ $(ls specs/test/refinements/feature-*-refinement.md 2>/dev/null | wc -l) -ge 2 ]' + +# Commit order: each feat is followed by a refine +check "Commit order: refine follows feat" \ + 'git log --oneline --reverse | grep -E "feat:|refine:" | \ + awk "/feat:/{f=1;next} /refine:/{if(f)f=0; else exit 1} END{exit f}"' + +# Progress file updated +check "Progress file updated" \ + '[ $(wc -l < specs/test/progress.txt) -gt 1 ]' + +echo "" +echo "Results: $PASS passed, $FAIL failed" +[ $FAIL -eq 0 ] && echo "ALL CHECKS PASSED" || echo "SOME CHECKS FAILED" +``` + +## Expected Results +- 2 `feat:` commits (one per feature) +- 2 `refine:` commits (one per feature) +- 2 refinement reports in `specs/test/refinements/` +- Alternating pattern: feat → refine → feat → refine +- Both features `"passes": true` + +## What This Catches +- Skipped refinements (the bug that prompted this test) +- Missing refinement reports +- Wrong commit order (refinement must follow its feature) +- Incomplete feature list updates diff --git a/tests/verify.sh b/tests/verify.sh new file mode 100755 index 0000000..184af23 --- /dev/null +++ b/tests/verify.sh @@ -0,0 +1,76 @@ +#!/bin/bash +# Verify iterative-dev skill produced expected artifacts +# Run this in the project directory AFTER the skill completes + +SCOPE=$(cat .active-scope 2>/dev/null || echo "unknown") +TYPE=$(cat feature_list.json | grep -o '"type": *"[^"]*"' | head -1 | grep -o '"[^"]*"$' | tr -d '"') +TOTAL_FEATURES=$(cat feature_list.json | grep -c '"id":' || echo 0) +PASSING_FEATURES=$(cat feature_list.json | grep -c '"passes": true' || echo 0) +FEAT_COMMITS=$(git log --oneline | grep -c "feat:" || true) +REFINE_COMMITS=$(git log --oneline | grep -c "refine:" || true) +REFINEMENT_REPORTS=$(ls specs/$SCOPE/refinements/feature-*-refinement.md 2>/dev/null | wc -l | tr -d ' ') +SCREENSHOTS=$(ls specs/$SCOPE/screenshots/feature-*.png 2>/dev/null | wc -l | tr -d ' ') + +PASS=0 +FAIL=0 + +check() { + if eval "$2"; then + echo " PASS: $1" + ((PASS++)) + else + echo " FAIL: $1" + ((FAIL++)) + fi +} + +echo "=== Iterative-Dev Skill Verification ===" +echo "Scope: $SCOPE | Type: $TYPE | Features: $TOTAL_FEATURES" +echo "" + +echo "--- Feature Completion ---" +check "All features pass ($PASSING_FEATURES/$TOTAL_FEATURES)" \ + "[ $PASSING_FEATURES -eq $TOTAL_FEATURES ]" + +echo "" +echo "--- Implementation ---" +check "Feature commits exist ($FEAT_COMMITS)" \ + "[ $FEAT_COMMITS -ge $TOTAL_FEATURES ]" + +echo "" +echo "--- Refinement (the critical gate) ---" +check "Refinement commits exist ($REFINE_COMMITS)" \ + "[ $REFINE_COMMITS -ge $TOTAL_FEATURES ]" +check "Refinement reports exist ($REFINEMENT_REPORTS)" \ + "[ $REFINEMENT_REPORTS -ge $TOTAL_FEATURES ]" +check "Refinements match features (commits: $REFINE_COMMITS >= features: $TOTAL_FEATURES)" \ + "[ $REFINE_COMMITS -ge $TOTAL_FEATURES ]" + +if [ "$TYPE" = "web" ] || [ "$TYPE" = "mobile" ]; then + echo "" + echo "--- Screenshots (web/mobile) ---" + # Count UI features (exclude infrastructure) + UI_FEATURES=$(cat feature_list.json | grep -c '"full-stack"\|"functional"\|"style"' || true) + check "Screenshots captured ($SCREENSHOTS for ~$UI_FEATURES UI features)" \ + "[ $SCREENSHOTS -gt 0 ]" +fi + +echo "" +echo "--- Commit Pattern ---" +# Verify feat/refine alternation +PATTERN_OK=true +LAST="" +while IFS= read -r line; do + TYPE_TAG=$(echo "$line" | grep -o "feat:\|refine:" || true) + if [ "$TYPE_TAG" = "feat:" ] && [ "$LAST" = "feat:" ]; then + PATTERN_OK=false # Two feats in a row = missing refinement + fi + [ -n "$TYPE_TAG" ] && LAST=$TYPE_TAG +done < <(git log --oneline --reverse) +check "No consecutive feat: commits (refinement between each)" \ + "$PATTERN_OK" + +echo "" +echo "=== Results: $PASS passed, $FAIL failed ===" +[ $FAIL -eq 0 ] && echo "ALL CHECKS PASSED" || echo "SOME CHECKS FAILED" +exit $FAIL