GitHub - conscientiousness/crabyard: Workflow skills and CLI for AI coding agents, with spec-driven changes, review-first flows, and execution graphs

Help coding agents evolve with your project

English · 繁體中文 · Install · Quick Start · CLI Commands

Install

Install the published CLI:

npm install -g crabyard

If you would rather not install it globally, use:

npx crabyard@latest --help

Quick Start

Once crabyard is available on your PATH, start with:

crabyard init /absolute/path/to/repo
crabyard validate --repo /absolute/path/to/repo
crabyard status --repo /absolute/path/to/repo
crabyard status add-auth --repo /absolute/path/to/repo --json
crabyard check add-auth --repo /absolute/path/to/repo
crabyard verify add-auth --repo /absolute/path/to/repo
crabyard sync add-auth --repo /absolute/path/to/repo
crabyard archive add-auth --repo /absolute/path/to/repo

After upgrading the CLI, refresh the replace-safe managed assets in an existing repo with:

crabyard update /absolute/path/to/repo

update refreshes replace-safe assets such as repo-local skills and the managed AGENTS.md routing block. It preserves repo-authored docs like project.md, knowledge/index.md, TASK_EXECUTION_FORMAT.md, and bucket README.md files, only recreating them when missing.

Add --backup only if you want replaced managed files copied into .crabyard/backups/ before refresh.

If the repo already uses OpenSpec, migrate the existing specs and change bundles with:

crabyard migrate openspec /absolute/path/to/repo

This keeps the original openspec/ tree in place, copies supported artifacts into crabyard/, and generates placeholder execution.yaml files for migrated change bundles.

A normal first loop looks like this:

crabyard init /absolute/path/to/repo
ask your agent tool to create crabyard/changes/<slug>/
let the agent write proposal.md, design.md, tasks.md, execution.yaml
run crabyard validate change <slug> --repo /absolute/path/to/repo
let the agent use crabyard status <slug> --repo /absolute/path/to/repo --json
implement from the ready frontier
run check, verify, sync, verify, archive

If you prefer npx, replace crabyard in the examples above with npx crabyard@latest.

Any agent tool that supports repo-local Skills can use this workflow.

Crabyard started from a simple observation: once you use coding agents seriously, the hard part is usually not getting them to write code. The hard part is keeping the repo understandable from one session to the next.

Tasks drift away from execution. Accepted product behavior gets mixed with draft ideas. Review findings disappear between turns. A week later, you still have code, but you no longer have a clean shared understanding of what is done, what is blocked, and what is safe to change.

Crabyard is a small repo-local layer meant to stop that drift before it becomes normal. It gives the agent a stable place to look for the plan, the execution truth, the accepted product truth, and the durable implementation knowledge, so the repo carries more of the working memory instead of leaving it scattered across chat history.

Concretely, it keeps these things separate:

human-readable task planning in tasks.md
machine-checkable execution truth in execution.yaml
accepted product truth in crabyard/specs/
in-flight accepted-truth edits in crabyard/changes/<slug>/specs/
durable implementation and debugging knowledge in crabyard/knowledge/

That creates a much cleaner loop for agent-assisted development:

You -> ask your agent tool for a change
     |
     v
crabyard plan/change bundle
     |
     v
agent reads proposal/design/tasks/execution
     |
     +--> status --json says:
     |      - what is ready now
     |      - what is blocked
     |      - what verify checks matter
     |
     v
agent implements one safe unit at a time
     |
     v
verify -> sync -> verify -> archive
     |
     v
repo stays coherent for the next session

The point is not documentation for its own sake. The point is to make agents more dependable at:

planning and reviewing changes
understanding execution order and parallelism
enforcing write ownership
expressing verification contracts
syncing accepted truth
preserving reusable knowledge

The most important design choice is explicit execution graphs in execution.yaml. tasks.md stays readable for humans, while scheduling, dependencies, write ownership, and verification metadata stay machine-checkable.

Crabyard was influenced by projects like Compound Engineering and OpenSpec. The difference is mostly one of scope: Crabyard stays deliberately smaller, keeps context inside the repo, and focuses on a simpler execution contract that is easier to carry forward as the project evolves.

Workflow

The workflow is short on purpose. It is meant to be easy to remember and easy to re-enter after context has gone stale.

research -> explore -> plan -> review -> apply -> review -> verify -> sync -> verify -> archive -> learn/refresh

AGENTS.md is the canonical repo-instruction file.
accepted truth lives in crabyard/specs/
in-flight accepted-truth edits live in crabyard/changes/<slug>/specs/
durable implementation and debugging knowledge lives in crabyard/knowledge/

What Gets Added To A Repo

After init, the repo gains a small amount of structure:

<repo>/
  AGENTS.md
  .agents/skills/
    crabyard-research/
    crabyard-explore/
    crabyard-plan/
    crabyard-apply/
    crabyard-review/
    crabyard-archive/
    crabyard-debug/
    crabyard-learn/
    crabyard-refresh/
  crabyard/
    manifest.yaml
    project.md
    TASK_EXECUTION_FORMAT.md
    specs/
    changes/
    knowledge/
      index.md

What A Change Looks Like

Each in-flight change lives in its own folder:

crabyard/changes/<slug>/
  proposal.md
  design.md
  tasks.md
  execution.yaml
  specs/
  review.md

review.md is optional.
execution.yaml is required.
specs/ is the staged source for accepted-spec updates.

What Crabyard Checks

The rule here is straightforward: execution.yaml cannot merely look plausible. It has to be structurally valid, and it has to line up with the tasks.md that a human would actually read. Otherwise the execution frontier is not worth trusting.

Crabyard parses execution.yaml with a real YAML parser and validates it against a schema.

It rejects:

inline shape violations
unknown depends_on
dependency cycles
duplicate unit ids
duplicate unit titles
missing parallel, writes, or verify
overlapping writes for concurrently eligible parallel: true units unless every conflicting unit opts out with allow_parallel_write_overlap: true
mismatches between top-level ## sections in tasks.md and units in execution.yaml

tasks.md and execution.yaml must match one-for-one and in order.

writes uses ownership semantics:

exact path: src/execution.ts
subtree: src/ or src/**
glob: src/**/*.ts, docs/{api,guide}.md, src/*/index.ts

Overlap checks are segment-aware, so src/*.ts and src/*.md can run in parallel while src/ still blocks any nested file ownership.

verify now accepts typed specs as well as legacy string shorthand:

command: kind, run or argv, optional cwd, timeout_ms, expect_exit_code
artifact: kind, path, optional state

Legacy verify: [pnpm test] remains valid and normalizes to a command check.

Use crabyard check <change> when you want those normalized checks to execute for real.

The Commands That Actually Matter

The CLI is intentionally small. Most of the time, agents only need a handful of commands, and everything else is there to support that loop:

crabyard validate to reject broken repo or change structure
crabyard status --json to inspect repo state, change state, frontier, and verification summary
crabyard check to execute normalized verify metadata for a change
crabyard verify to enforce deterministic closure gates
crabyard search to search compiled repo knowledge quickly
crabyard lint knowledge to detect index drift and malformed knowledge metadata
crabyard sync to stage accepted-truth updates into canonical specs
crabyard archive to close only verified and sync-coherent changes

That split is deliberate: skills stay thin, and the CLI remains the source of truth.

How It Fits Into A Real Session

The easiest way to think about Crabyard is as shared working memory that sits next to your normal agent workflow. The difference is that the repo now has a clean place for the plan, the frontier, and the closure rules.

Typical setup:

1. You ask your agent tool for a feature or fix
2. The agent creates or updates crabyard/changes/<slug>/
3. The agent reads tasks.md + execution.yaml instead of guessing execution order
4. The agent uses status --json to decide what is ready now
5. The agent implements, reviews, verifies, syncs, and archives against explicit gates

A practical interaction loop looks like this:

You: add OAuth login
  |
  v
Agent:
  - creates change bundle
  - writes proposal/design/tasks/execution
  - checks status --json
  - executes only ready units
  - re-checks status after each step
  - closes with verify/sync/archive

CLI Commands

init: set up Crabyard files in a repo
install: alias for init
update: refresh replace-safe managed assets in an existing repo while preserving repo-authored docs
migrate: copy OpenSpec specs and change bundles into Crabyard
list: show available changes in the repo
show: print one change bundle for inspection
validate: check repo or change structure before work continues
status: inspect repo state, change state, and the current frontier
check: execute the normalized verify checks for a change
verify: enforce closure gates for a change
search: search crabyard/knowledge/ and optionally crabyard/specs/
lint: currently supports lint knowledge for the compiled knowledge layer
sync: copy accepted-spec updates into canonical specs
archive: close a verified, sync-coherent change

`check <change>`

check is where typed verify metadata becomes executable. It runs normalized command and artifact checks and reports per-unit results.

Unlike verify, it does not require tasks.md to be fully checked off first. It is meant for executing real checks while work is still in progress.

`verify <change>`

Think of verify as a closure gate, not a task runner. It validates the change bundle, checks that execution.yaml is trustworthy, and fails if tasks.md still has unchecked items.

It does not execute arbitrary shell commands from the verify arrays in execution.yaml.

`status [change]`

This is usually the command an agent reads the most. It is also read-only.

status with no change summarizes repo validity, counts, and active change states
status <change> summarizes task completion, ready units, blocked units, verification gaps, sync readiness, and the current execution frontier
--json returns machine-readable status for agent tooling
status --json now includes frontier.readyUnits, frontier.blockedUnits, and verification.summary

Example:

crabyard status add-auth --repo /absolute/path/to/repo --json

Typical JSON fields:

state
units.items
frontier.readyUnits
frontier.blockedUnits
verification.summary
sync.pending

`sync <change>`

sync does one thing: it moves accepted-spec updates from:

crabyard/changes/<slug>/specs/

to:

crabyard/specs/

The behavior is intentionally conservative:

the change must already pass crabyard verify <change>
files staged under the change are copied or overwritten into accepted specs
files absent from the change are left untouched in accepted specs
file order is deterministic

`archive <change>`

archive is not just a rename. It only closes a change when the repo is in a coherent state.

It fails unless:

verify passes
staged spec sync is coherent

The intended closure sequence is:

crabyard verify <change>
crabyard sync <change> if needed
crabyard verify <change>
crabyard archive <change>

Built-In Skills

Crabyard installs a small set of repo-local skills under .agents/skills/. Any agent tool that supports repo-local Skills can use them. That is deliberate. You should be able to clone a repo, run init, and hand the agent the same small toolkit every time instead of depending on someone's global setup.

crabyard-research
crabyard-explore
crabyard-plan
crabyard-apply
crabyard-review
crabyard-archive
crabyard-debug
crabyard-learn
crabyard-refresh

These skills only live inside the repo. Knowledge retrieval is treated as part of the workflow, not as an afterthought.

crabyard-research searches crabyard/knowledge/index.md, crabyard/knowledge/, and relevant specs for the strongest prior learnings
crabyard-explore, crabyard-plan, and crabyard-review now begin with an explicit retrieval pass
retrieved knowledge informs decisions, but does not override accepted truth in crabyard/specs/
crabyard-review can run both before apply to stress-test the plan and after apply to review the implementation

The reusable review layer lives in crabyard-review and looks at:

code
proposal
design
tasks
execution plan
relevant specs

It reports prioritized findings as P1 / P2 / P3 and can write crabyard/changes/<slug>/review.md.

How Knowledge Stays Useful

Crabyard keeps implementation and debugging notes in crabyard/knowledge/, but the goal is not note-taking for its own sake. The goal is to make the next piece of work easier than the last one.

crabyard-research returns the strongest 1-3 prior learnings before planning, review, or debugging
crabyard-learn checks overlap before creating a note and updates knowledge/index.md
crabyard-refresh supports targeted refresh, consolidation, replacement, and stale marking
optional note frontmatter can add kind, tags, paths, related_specs, related_changes, supersedes, and last_verified_at
knowledge/index.md stays retrieval-friendly and canonical

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
assets/repo		assets/repo
docs/assets		docs/assets
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh-TW.md		README.zh-TW.md
SECURITY.md		SECURITY.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install

Quick Start

Workflow

What Gets Added To A Repo

What A Change Looks Like

What Crabyard Checks

The Commands That Actually Matter

How It Fits Into A Real Session

CLI Commands

`check <change>`

`verify <change>`

`status [change]`

`sync <change>`

`archive <change>`

Built-In Skills

How Knowledge Stays Useful

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Install

Quick Start

Workflow

What Gets Added To A Repo

What A Change Looks Like

What Crabyard Checks

The Commands That Actually Matter

How It Fits Into A Real Session

CLI Commands

check <change>

verify <change>

status [change]

sync <change>

archive <change>

Built-In Skills

How Knowledge Stays Useful

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`check <change>`

`verify <change>`

`status [change]`

`sync <change>`

`archive <change>`

Packages