A structured framework for AI agents to build software autonomously — with verification, guardrails, and human oversight built in.
View Landing Page · Example Project Output
A framework that turns ad-hoc AI prompting into a repeatable workflow with automatic verification. Three phases:
- Specify — Guided Q&A produces
plans/greenfield/PRODUCT_SPEC.mdandplans/greenfield/TECHNICAL_SPEC.md - Plan — Generator creates
plans/greenfield/EXECUTION_PLAN.md,plans/PLAN_STATUS.md, a scopedplans/greenfield/AGENTS.md, and a durable rootAGENTS.md - Execute — AI agents work task-by-task with automatic verification after each one
What makes execution robust?
- Code verification — Multi-agent system checks each task against its acceptance criteria
- TDD enforcement — Verifies tests exist, were written first, and have meaningful assertions
- Security scanning — Dependency audits, secrets detection, and static analysis at checkpoints
- Auto-advance — Phases chain automatically when no human intervention is needed
- Stuck detection — Agents escalate to humans instead of spinning on failures
- Cross-model review — Optional second-opinion review using OpenAI Codex CLI
- Claude Code — Anthropic's CLI for Claude (primary interface)
- Git — Required for the branching and commit workflow
Codex CLI users: see Codex CLI Setup. Not using Claude Code? See Manual Setup.
If you switch between Claude Code and Codex across multiple laptops, bootstrap shared machine-level skills and MCPs from this toolkit repo:
./scripts/bootstrap-agent-runtime.shThis sets up:
~/.claude/skills/(Claude personal skills)~/.agents/skills/(Codex user skills)~/.codex/skills -> ~/.agents/skillscompatibility symlink (when~/.codex/skillsis not already in use)- MCP servers for both agents from
config/mcp/servers.json
Safety defaults:
- Existing non-toolkit skills are preserved (not overwritten)
- Existing MCPs are discovered and normalized via
add-mcp list+add-mcp sync(use--no-normalize-existingfor manifest-only MCP apply)
# 1. Clone the toolkit and set up your project (one-time)
git clone https://github.com/benjaminshoemaker/ai_coding_project_base.git
cd ai_coding_project_base
/setup ~/Projects/my-new-app
# 2. Generate specs, plan, and execute (from your project directory)
cd ~/Projects/my-new-app
/product-spec
/technical-spec
/generate-plan
cd plans/greenfield
/fresh-start/fresh-start loads context and auto-advances through phase-prep → phase-start → phase-checkpoint for each phase, stopping only when human input is needed. To resume later, run /go — it detects where you left off.
For feature development in existing projects, see Feature Workflow.
┌─────────────────────────────────────────────────────────────────────────┐
│ SPECIFICATION PHASE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Your Idea │
│ ↓ │
│ /product-spec ───────────→ plans/greenfield/PRODUCT_SPEC.md │
│ ↓ │
│ /technical-spec ─────────→ plans/greenfield/TECHNICAL_SPEC.md │
│ ↓ │
│ [Auto-Verify] ─────────────→ Check context preservation & quality │
│ │
└─────────────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────────────┐
│ PLANNING PHASE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ /generate-plan ──────────→ plans/greenfield/EXECUTION_PLAN.md │
│ plans/PLAN_STATUS.md │
│ AGENTS.md + plans/greenfield/AGENTS.md │
│ ↓ │
│ [Auto-Verify] ─────────────→ Check context preservation & quality │
│ │
└─────────────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────────────┐
│ EXECUTION PHASE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ cd plans/greenfield │
│ ↓ │
│ /fresh-start ────────────→ Orient to scoped plan, load context │
│ ↓ │
│ /phase-start N ──────────→ Execute phase (branch + commits) │
│ ↓ │
│ /phase-checkpoint N ─────→ Verify, test, security scan │
│ │
│ Phase 1 → Checkpoint → Phase 2 → Checkpoint → Phase 3 → ... │
│ │
└─────────────────────────────────────────────────────────────────────────┘
| Command | Description |
|---|---|
/product-spec |
Generate product specification |
/technical-spec |
Generate technical specification |
/generate-plan |
Generate the greenfield execution plan plus root and scoped AGENTS files |
/go |
Resume execution from wherever you left off |
/fresh-start |
Orient to project, load context, begin execution |
/phase-start N |
Execute phase N (creates branch, commits per task) |
/phase-checkpoint N |
Verify phase: tests, lint, security, then production checks |
/verify-task X.Y.Z |
Verify a specific task's acceptance criteria |
/create-pr |
Create GitHub PR with automatic Codex review |
See Command Reference for the full list including feature, setup, verification, and recovery commands.
your-project/
├── AGENTS.md # Durable project-wide workflow rules
├── CLAUDE.md # Root Claude shim
├── LEARNINGS.md # Discovered patterns (created as you work)
├── DEFERRED.md # Deferred requirements (captured during Q&A)
├── plans/
│ ├── PLAN_STATUS.md # Single current-plan pointer and plan history
│ ├── archive/ # Superseded/rejected/abandoned greenfield snapshots
│ └── greenfield/
│ ├── PRODUCT_SPEC.md # What you're building
│ ├── TECHNICAL_SPEC.md # How it's built
│ ├── EXECUTION_PLAN.md # Tasks with acceptance criteria
│ ├── AGENTS.md # Greenfield execution guidance
│ └── CLAUDE.md # Scoped Claude shim
├── features/
│ └── archive/ # Superseded/rejected/abandoned feature snapshots
├── .claude/
│ ├── verification-config.json
│ └── toolkit-version.json # Tracks toolkit sync state (global skill resolution)
└── [your code]
Skills resolve globally from ~/.claude/skills/ (managed by ./scripts/bootstrap-agent-runtime.sh) rather than long-lived per-project copies.
These documents persist across sessions, enabling any AI agent to pick up where another left off.
plans/PLAN_STATUS.md tells Claude Code, Codex, and other agents which plan is
current so historical specs and abandoned experiments stay available without
becoming accidental requirements.
- Command Reference — Full list of all slash commands with options
- Feature Workflow — Adding features to existing projects
- Workflow Automation — Auto-advance, git workflow, parallel workstreams, project syncing
- Verification Deep Dive — TDD, security scanning, browser verification, spec verification
- Codex CLI Setup — Cross-model review, Codex task execution, installation
- Codex App Workflows — Parallel workstreams with Codex App and Claude Code
- Recovery Commands — Handling failures and rollbacks
- Advanced Topics — Brownfield support, AGENTS.md limits, optional tools
- Web Interface Usage — Using with ChatGPT, Claude web, etc.
- Manual Setup — Copy-paste prompts for non-Claude-Code users
See CONTRIBUTING.md for guidelines.
npm run lint # Check markdown
npm run lint:fix # Auto-fixMIT