Help coding agents evolve with your project
English · 繁體中文 · Install · Quick Start · CLI Commands
Install the published CLI:
npm install -g crabyardIf you would rather not install it globally, use:
npx crabyard@latest --helpOnce crabyard is available on your PATH, start with:
crabyard init /absolute/path/to/repo
crabyard validate --repo /absolute/path/to/repo
crabyard status --repo /absolute/path/to/repo
crabyard status add-auth --repo /absolute/path/to/repo --json
crabyard check add-auth --repo /absolute/path/to/repo
crabyard verify add-auth --repo /absolute/path/to/repo
crabyard sync add-auth --repo /absolute/path/to/repo
crabyard archive add-auth --repo /absolute/path/to/repoAfter upgrading the CLI, refresh the replace-safe managed assets in an existing repo with:
crabyard update /absolute/path/to/repoupdate refreshes replace-safe assets such as repo-local skills and the managed AGENTS.md routing block. It preserves repo-authored docs like project.md, knowledge/index.md, TASK_EXECUTION_FORMAT.md, and bucket README.md files, only recreating them when missing.
Add --backup only if you want replaced managed files copied into .crabyard/backups/ before refresh.
If the repo already uses OpenSpec, migrate the existing specs and change bundles with:
crabyard migrate openspec /absolute/path/to/repoThis keeps the original openspec/ tree in place, copies supported artifacts into crabyard/, and generates placeholder execution.yaml files for migrated change bundles.
A normal first loop looks like this:
crabyard init /absolute/path/to/repo- ask your agent tool to create
crabyard/changes/<slug>/ - let the agent write
proposal.md,design.md,tasks.md,execution.yaml - run
crabyard validate change <slug> --repo /absolute/path/to/repo - let the agent use
crabyard status <slug> --repo /absolute/path/to/repo --json - implement from the ready frontier
- run
check,verify,sync,verify,archive
If you prefer npx, replace crabyard in the examples above with npx crabyard@latest.
Any agent tool that supports repo-local Skills can use this workflow.
Crabyard started from a simple observation: once you use coding agents seriously, the hard part is usually not getting them to write code. The hard part is keeping the repo understandable from one session to the next.
Tasks drift away from execution. Accepted product behavior gets mixed with draft ideas. Review findings disappear between turns. A week later, you still have code, but you no longer have a clean shared understanding of what is done, what is blocked, and what is safe to change.
Crabyard is a small repo-local layer meant to stop that drift before it becomes normal. It gives the agent a stable place to look for the plan, the execution truth, the accepted product truth, and the durable implementation knowledge, so the repo carries more of the working memory instead of leaving it scattered across chat history.
Concretely, it keeps these things separate:
- human-readable task planning in
tasks.md - machine-checkable execution truth in
execution.yaml - accepted product truth in
crabyard/specs/ - in-flight accepted-truth edits in
crabyard/changes/<slug>/specs/ - durable implementation and debugging knowledge in
crabyard/knowledge/
That creates a much cleaner loop for agent-assisted development:
You -> ask your agent tool for a change
|
v
crabyard plan/change bundle
|
v
agent reads proposal/design/tasks/execution
|
+--> status --json says:
| - what is ready now
| - what is blocked
| - what verify checks matter
|
v
agent implements one safe unit at a time
|
v
verify -> sync -> verify -> archive
|
v
repo stays coherent for the next session
The point is not documentation for its own sake. The point is to make agents more dependable at:
- planning and reviewing changes
- understanding execution order and parallelism
- enforcing write ownership
- expressing verification contracts
- syncing accepted truth
- preserving reusable knowledge
The most important design choice is explicit execution graphs in execution.yaml. tasks.md stays readable for humans, while scheduling, dependencies, write ownership, and verification metadata stay machine-checkable.
Crabyard was influenced by projects like Compound Engineering and OpenSpec. The difference is mostly one of scope: Crabyard stays deliberately smaller, keeps context inside the repo, and focuses on a simpler execution contract that is easier to carry forward as the project evolves.
The workflow is short on purpose. It is meant to be easy to remember and easy to re-enter after context has gone stale.
research -> explore -> plan -> review -> apply -> review -> verify -> sync -> verify -> archive -> learn/refresh
AGENTS.mdis the canonical repo-instruction file.- accepted truth lives in
crabyard/specs/ - in-flight accepted-truth edits live in
crabyard/changes/<slug>/specs/ - durable implementation and debugging knowledge lives in
crabyard/knowledge/
After init, the repo gains a small amount of structure:
<repo>/
AGENTS.md
.agents/skills/
crabyard-research/
crabyard-explore/
crabyard-plan/
crabyard-apply/
crabyard-review/
crabyard-archive/
crabyard-debug/
crabyard-learn/
crabyard-refresh/
crabyard/
manifest.yaml
project.md
TASK_EXECUTION_FORMAT.md
specs/
changes/
knowledge/
index.md
Each in-flight change lives in its own folder:
crabyard/changes/<slug>/
proposal.md
design.md
tasks.md
execution.yaml
specs/
review.md
review.mdis optional.execution.yamlis required.specs/is the staged source for accepted-spec updates.
The rule here is straightforward: execution.yaml cannot merely look plausible. It has to be structurally valid, and it has to line up with the tasks.md that a human would actually read. Otherwise the execution frontier is not worth trusting.
Crabyard parses execution.yaml with a real YAML parser and validates it against a schema.
It rejects:
- inline shape violations
- unknown
depends_on - dependency cycles
- duplicate unit ids
- duplicate unit titles
- missing
parallel,writes, orverify - overlapping
writesfor concurrently eligibleparallel: trueunits unless every conflicting unit opts out withallow_parallel_write_overlap: true - mismatches between top-level
##sections intasks.mdand units inexecution.yaml
tasks.md and execution.yaml must match one-for-one and in order.
writes uses ownership semantics:
- exact path:
src/execution.ts - subtree:
src/orsrc/** - glob:
src/**/*.ts,docs/{api,guide}.md,src/*/index.ts
Overlap checks are segment-aware, so src/*.ts and src/*.md can run in parallel while src/ still blocks any nested file ownership.
verify now accepts typed specs as well as legacy string shorthand:
- command:
kind,runorargv, optionalcwd,timeout_ms,expect_exit_code - artifact:
kind,path, optionalstate
Legacy verify: [pnpm test] remains valid and normalizes to a command check.
Use crabyard check <change> when you want those normalized checks to execute for real.
The CLI is intentionally small. Most of the time, agents only need a handful of commands, and everything else is there to support that loop:
crabyard validateto reject broken repo or change structurecrabyard status --jsonto inspect repo state, change state, frontier, and verification summarycrabyard checkto execute normalized verify metadata for a changecrabyard verifyto enforce deterministic closure gatescrabyard searchto search compiled repo knowledge quicklycrabyard lint knowledgeto detect index drift and malformed knowledge metadatacrabyard syncto stage accepted-truth updates into canonical specscrabyard archiveto close only verified and sync-coherent changes
That split is deliberate: skills stay thin, and the CLI remains the source of truth.
The easiest way to think about Crabyard is as shared working memory that sits next to your normal agent workflow. The difference is that the repo now has a clean place for the plan, the frontier, and the closure rules.
Typical setup:
1. You ask your agent tool for a feature or fix
2. The agent creates or updates crabyard/changes/<slug>/
3. The agent reads tasks.md + execution.yaml instead of guessing execution order
4. The agent uses status --json to decide what is ready now
5. The agent implements, reviews, verifies, syncs, and archives against explicit gates
A practical interaction loop looks like this:
You: add OAuth login
|
v
Agent:
- creates change bundle
- writes proposal/design/tasks/execution
- checks status --json
- executes only ready units
- re-checks status after each step
- closes with verify/sync/archive
init: set up Crabyard files in a repoinstall: alias forinitupdate: refresh replace-safe managed assets in an existing repo while preserving repo-authored docsmigrate: copy OpenSpec specs and change bundles into Crabyardlist: show available changes in the reposhow: print one change bundle for inspectionvalidate: check repo or change structure before work continuesstatus: inspect repo state, change state, and the current frontiercheck: execute the normalized verify checks for a changeverify: enforce closure gates for a changesearch: searchcrabyard/knowledge/and optionallycrabyard/specs/lint: currently supportslint knowledgefor the compiled knowledge layersync: copy accepted-spec updates into canonical specsarchive: close a verified, sync-coherent change
check is where typed verify metadata becomes executable. It runs normalized command and artifact checks and reports per-unit results.
Unlike verify, it does not require tasks.md to be fully checked off first. It is meant for executing real checks while work is still in progress.
Think of verify as a closure gate, not a task runner. It validates the change bundle, checks that execution.yaml is trustworthy, and fails if tasks.md still has unchecked items.
It does not execute arbitrary shell commands from the verify arrays in execution.yaml.
This is usually the command an agent reads the most. It is also read-only.
statuswith no change summarizes repo validity, counts, and active change statesstatus <change>summarizes task completion, ready units, blocked units, verification gaps, sync readiness, and the current execution frontier--jsonreturns machine-readable status for agent toolingstatus --jsonnow includesfrontier.readyUnits,frontier.blockedUnits, andverification.summary
Example:
crabyard status add-auth --repo /absolute/path/to/repo --jsonTypical JSON fields:
stateunits.itemsfrontier.readyUnitsfrontier.blockedUnitsverification.summarysync.pending
sync does one thing: it moves accepted-spec updates from:
crabyard/changes/<slug>/specs/
to:
crabyard/specs/
The behavior is intentionally conservative:
- the change must already pass
crabyard verify <change> - files staged under the change are copied or overwritten into accepted specs
- files absent from the change are left untouched in accepted specs
- file order is deterministic
archive is not just a rename. It only closes a change when the repo is in a coherent state.
It fails unless:
verifypasses- staged spec sync is coherent
The intended closure sequence is:
crabyard verify <change>crabyard sync <change>if neededcrabyard verify <change>crabyard archive <change>
Crabyard installs a small set of repo-local skills under .agents/skills/. Any agent tool that supports repo-local Skills can use them. That is deliberate. You should be able to clone a repo, run init, and hand the agent the same small toolkit every time instead of depending on someone's global setup.
crabyard-researchcrabyard-explorecrabyard-plancrabyard-applycrabyard-reviewcrabyard-archivecrabyard-debugcrabyard-learncrabyard-refresh
These skills only live inside the repo. Knowledge retrieval is treated as part of the workflow, not as an afterthought.
crabyard-researchsearchescrabyard/knowledge/index.md,crabyard/knowledge/, and relevant specs for the strongest prior learningscrabyard-explore,crabyard-plan, andcrabyard-reviewnow begin with an explicit retrieval pass- retrieved knowledge informs decisions, but does not override accepted truth in
crabyard/specs/ crabyard-reviewcan run both before apply to stress-test the plan and after apply to review the implementation
The reusable review layer lives in crabyard-review and looks at:
- code
- proposal
- design
- tasks
- execution plan
- relevant specs
It reports prioritized findings as P1 / P2 / P3 and can write crabyard/changes/<slug>/review.md.
Crabyard keeps implementation and debugging notes in crabyard/knowledge/, but the goal is not note-taking for its own sake. The goal is to make the next piece of work easier than the last one.
crabyard-researchreturns the strongest 1-3 prior learnings before planning, review, or debuggingcrabyard-learnchecks overlap before creating a note and updatesknowledge/index.mdcrabyard-refreshsupports targeted refresh, consolidation, replacement, and stale marking- optional note frontmatter can add
kind,tags,paths,related_specs,related_changes,supersedes, andlast_verified_at knowledge/index.mdstays retrieval-friendly and canonical