evo

Get started with autoresearch on any codebase — two commands.

Try it · Install · How it works · Dashboard · Upgrading

A plugin for your agentic framework that optimizes code through experiments

You give it a codebase. It discovers metrics to optimize, sets up the evaluation, and starts running experiments in a loop -- trying things, keeping what improves the score, throwing away what doesn't.

Inspired by Karpathy's autoresearch -- where an LLM runs training experiments autonomously to beat its own best score. Autoresearch is a pure hill climb: try something, keep or revert, repeat on a single branch. Evo adds structure on top of that idea:

Tree search over greedy hill climb. Multiple directions can fork from any committed node, so exploration doesn't collapse to one path.
Parallel semi-autonomous agents. Spawn multiple subagents and run them simultaneously, each in its own git worktree. Each subagent reads traces, formulates hypotheses, and can run multiple iterations within its branch.
Shared state. Failure traces, annotations, and discarded hypotheses are accessible to every agent before it decides what to try next.
Gating. Regression tests or safety checks can be wired up as a gate. Experiments that don't pass get discarded.
Observability. A dashboard to monitor your experiments.
Benchmark discovery. The discover skill explores the repo, figures out what to measure, and instruments the evaluation.

Runs on Claude Code, Codex, OpenClaw, Hermes, Opencode, or Pi. Experiments run locally or on remote sandboxes — Modal, E2B, Daytona, AWS, Azure, SSH.

Try it

Two commands:

/evo:discover     # one-time code discovery: figures out benchmarks and creates gates against unintended changes
/evo:optimize     # run the loop

discover asks what to optimize, the benchmark command, and the metric direction. Skip the questions by seeding the answer:

/evo:discover make the JSON parser at src/parser.py faster

Pass optimize parameters as key=value after the skill name:

/evo:optimize subagents=3 budget=10 stall=3

Parameter	Default	Description
`subagents`	5	Parallel subagents per round
`budget`	5	Max iterations each subagent can run within its branch
`stall`	5	Consecutive rounds with no improvement before auto-stopping

Invocation syntax is host-specific: /evo: on Claude Code, $evo on Codex, natural language on Hermes, Opencode, OpenClaw, and Pi.

Install

# 1. evo CLI
uv tool install evo-hq-cli

# 2. Host CLI (if you don't already have it)
npm install -g @anthropic-ai/claude-code     # or @openai/codex, openclaw, @earendil-works/pi-coding-agent

# 3. Plugin + host hooks
evo install <host>     # claude-code | codex | hermes | opencode | openclaw | pi

evo install <host> installs the plugin into the host's marketplace and stages the hooks evo needs to talk to in-flight subagents. Verify with evo doctor <host>.

For remote backends, install with the matching provider extra: uv tool install 'evo-hq-cli[modal]' (or [e2b], [daytona], [aws], [azure], [all]).

Codex hook trust

Codex requires manual approval for plugin hooks. After install, run /hooks inside codex to trust evo's hooks — or pass --trust-hooks to evo install codex to skip the prompt.

How it works

Parallel

The orchestrator dispatches subagents in parallel. Each runs in its own isolated workspace, picks up shared state (failure traces, annotations, discarded hypotheses), forms a hypothesis, edits, and runs the benchmark. A subagent with iteration budget remaining continues on its branch within the same round when its prior edit warrants a follow-up.

Frontier strategy

After each round, the orchestrator selects which committed branch to extend next. Available strategies:

argmax — extend the highest-scoring branch
top_k — round-robin among the K best
epsilon_greedy — best most of the time, random sometimes
softmax — sample weighted by score
pareto_per_task — keep specialists the aggregate hides, inspired by GEPA

Configure in the dashboard's Frontier tab, which lists each strategy's parameters.

Cross-cutting scans

Between rounds, RLM-inspired scan subagents read trace batches in parallel and surface compound failure patterns: gate-failure intersections, shared root causes across traces. Findings land in shared state, which the next round's subagents read at startup.

Gates

evo introduces gates: pass/fail checks that run on every experiment. An experiment that fails a gate is discarded even if its score beats the current best. Without gates, the search will find ways to return a constant, skip work, or trade correctness for speed.

Any command that exits zero on pass and non-zero on fail qualifies as a gate: a test suite, an invariant script, a score floor on a held-out slice of the benchmark. Gates inherit down the experiment tree: a gate registered at the root runs on every descendant. Narrower gates can be attached to specific branches.

When discover builds a benchmark from scratch, it attaches a held-out-slice score-floor gate automatically. When the benchmark already exists in the repo, gates are opt-in.

Where experiments run

Backend	Where	Install
worktree (default)	local git worktree per experiment	included
pool	reuse a fixed set of local workspaces	included
ssh	your own SSH host	included
modal	Modal serverless cloud	`uv tool install 'evo-hq-cli[modal]'`
e2b	E2B cloud sandboxes	`uv tool install 'evo-hq-cli[e2b]'`
daytona	Daytona cloud workspaces	`uv tool install 'evo-hq-cli[daytona]'`
aws	AWS EC2 sandboxes	`uv tool install 'evo-hq-cli[aws]'`
azure	Azure VMs	`uv tool install 'evo-hq-cli[azure]'`

Pick and configure in the dashboard's Backend tab.

Dashboard

The dashboard starts automatically with /evo:discover (or evo init) and prints the URL in chat:

Dashboard live: http://127.0.0.1:8080 (pid 12345)

If 8080 is in use, evo increments to the next free port (8081, 8082, …) and prints it. Subsequent runs reuse the chosen port. Start it manually with:

uv run --project /path/to/evo/plugins/evo evo dashboard --port 8080

Upgrading

evo update <host>                    # host: claude-code | codex | hermes | opencode | openclaw | pi
evo update <host> --version 0.4.1    # pin to a release

See evo update --help for --force, --scope, and additional flags.

Migrating from v0.4.0 or earlier

Reinstall the CLI and refresh the host plugin cache:

uv tool install --force evo-hq-cli && evo update --force

--force wipes the host plugin cache and reinstalls, working around anthropics/claude-code#14061: /plugin update returns success but does not replace cached plugin files.

Testing a pre-release (alpha)

uv and pip skip pre-releases by default. To install an alpha, pin both the CLI version and the host plugin tag:

uv tool install --force 'evo-hq-cli==0.4.1a2' && \
  evo update --version 0.4.1-alpha.2 --force

Substitute the target alpha version. The CLI uses PEP 440 form (0.4.1a2); the marketplace tag uses the dash form (v0.4.1-alpha.2).

Dev install

For development on evo:

git clone https://github.com/evo-hq/evo
cd evo
uv tool install --editable plugins/evo

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 218 Commits
.agents/plugins		.agents/plugins
.claude-plugin		.claude-plugin
.github/workflows		.github/workflows
assets		assets
plugins/evo		plugins/evo
scripts		scripts
sdk		sdk
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

evo

Try it

Install

Codex hook trust

How it works

Parallel

Frontier strategy

Cross-cutting scans

Gates

Where experiments run

Dashboard

Upgrading

Migrating from v0.4.0 or earlier

Testing a pre-release (alpha)

Dev install

License

About

Uh oh!

Releases 22

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

evo

Try it

Install

Codex hook trust

How it works

Parallel

Frontier strategy

Cross-cutting scans

Gates

Where experiments run

Dashboard

Upgrading

Migrating from v0.4.0 or earlier

Testing a pre-release (alpha)

Dev install

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages