Specialist AI agents with cost guardrails and sessions that don't break at hour 4.
Plug-and-play domain experts. Adaptive compaction. Real spending limits.
- The Problem
- Three Pillars
- Pillar 1 — Cost as a Control Plane
- Pillar 2 — Zero-Config Specialist Agents
- Pillar 3 — Sessions That Don't Break at Hour 4
- Quick Start
- How It Works
- Power Users — Roles, Workflows, Layers
- Embedders — ACP, WebSocket, Daemon
- Installation
- Configuration
- Architecture
- Contributing
- License
You want an AI that knows your domain, costs what you expected, and remembers what it decided an hour ago. In 2026 you get:
- 45 minutes of setup per domain — MCP servers, system prompts, tool configs, credentials, all wired by hand, every time.
- Surprise bills. Cursor users posting $7K daily overages. Claude Code rate-limiting mid-task. No per-task budget, no simulator, no kill switch.
- Context rot at hour 2-4. Naive truncation drops the decisions you need. Quality collapses. You restart.
- One generic assistant for every task — Rust debugging, blood-test interpretation, contract review — same prompt, same tools.
Every coding agent in 2026 swaps models and calls it a feature. Octomind solves the three things they don't.
| Pillar | What it gives you | Built on |
|---|---|---|
| Cost as a control plane | Per-request and per-session spending limits, real-time cost tracking, cache-aware accounting. | src/config/roles.rs, spending threshold enforcement |
| Zero-config specialist agents | octomind run aws-debug → prompts + MCP servers + tools, all pre-wired. Agents auto-enable new MCP servers mid-session and spawn sub-agents. |
Tap registry, mcp and agent built-in tools, runtime self-extension |
| Sessions that don't break at hour 4 | SOTA adaptive compaction: cache-aware, structurally preserving. Smaller context = faster responses + lower cost. | src/mcp/core/plan/compression.rs |
Cost is treated as infrastructure, not an afterthought.
# Hard spending limits — enforced, not advisory
max_request_spending_threshold = 0.50 # USD per request
max_session_spending_threshold = 5.00 # USD per session
# Per-role model selection — pay Opus only where it's worth it
[[roles]]
name = "researcher"
model = "google:gemini-2.5-flash" # cheap broad context
[[roles]]
name = "reviewer"
model = "anthropic:claude-opus-4" # precision where it countsWhat's already shipped:
- Real-time cost tracking per request and per session.
- Cache-aware token accounting (
cache_read_tokens,cache_write_tokensseparated from input/output). - Hard spending thresholds with enforcement.
- Per-role and per-layer model selection — different roles can run on different providers.
What's in flight:
- Cost simulator (
octomind cost simulate <agent> --runs N). - Provider arbitrage — dynamic routing on cost-per-quality-point.
- Per-provider budgets and alerts.
Cursor users get $7,000 surprise bills. Octomind agents trip a budget and stop, fall back, or warn — before the bill, not after.
Most agent tools force you to assemble a system prompt, choose MCP servers, set up role config, and manage credentials — for every domain, on every machine, every time. Octomind ships specialists ready to run.
octomind run developer # general dev, language skills auto-activate
octomind run doctor:blood # blood-test interpretation specialist
octomind run doctor:nutrition # nutrition specialist→ Fetches the agent manifest from the tap registry
→ Installs required binaries automatically (skips if already present)
→ Prompts once for any credentials, saves permanently
→ Spins up the right MCP servers for this domain
→ Loads specialist model config, system prompt, tool permissions
→ Ready in ~5 seconds, not 45 minutes
This is packaged expertise — not a prompt file, not a skill injection. The full stack, configured by the community, ready to run.
Every agent has two built-in power tools that let it acquire new capabilities and spawn sub-agents mid-session, without restart:
| Tool | What it does |
|---|---|
mcp |
Enable or disable MCP servers on the fly. Agent picks the server it needs and registers it mid-conversation. |
agent |
Spawn a specialist sub-agent for a sub-task. Sub-agent runs, returns, parent continues. |
User: "Cross-reference our Postgres metrics with the deployment log"
Agent:
→ mcp.enable(postgres-mcp) # auto-detected need, no user prompt
→ agent.spawn(log_reader) # delegates log parsing
→ results merge mid-session
→ mcp.disable(postgres-mcp) # cleans up
→ presents the analysis
OpenClaw pre-loads 5,700 skills into every agent. Octomind starts focused for the domain and grows only when needed. Smaller context, lower cost, faster responses, no surprise tools.
octomind tap yourteam/tap # clones github.com/yourteam/octomind-tap
octomind tap yourteam/internal ~/path # local tap for private agents
octomind run finance:analyst # available immediately
octomind run security:owaspEach tap is a Git repo. Each agent is one TOML file. Pull requests are contributions.
Want to publish your expertise? A
doctor:medications, alawyer:us, adevops:terraform. One file, and everyone with that problem gets a specialist instantly. How to write a tap agent →
Every coding agent degrades after a few hours. Context fills. Decisions get truncated. The agent forgets why it started.
Octomind's adaptive compaction engine runs automatically:
- Cache-aware — calculates if compaction is worth it before paying for it. Never breaks the prompt-cache hit by accident.
- Pressure-tiered — compacts more aggressively as context grows.
- Structurally preserving — keeps decisions, file references, errors, dependencies; drops noise.
- Plan-aware and free-form-aware — works whether you use the
plantool or have a free-form chat. - Fully automatic — you never think about it.
The second-order benefit: smaller context means fewer tokens, faster responses, lower cost every turn after compaction fires. The three pillars compound.
Work on a hard problem for 4 hours. The agent still knows what it decided in hour one.
# Install (macOS & Linux)
curl -fsSL https://raw.githubusercontent.com/muvon/octomind/master/install.sh | bash
# One API key gets you many providers
export OPENROUTER_API_KEY="your_key"
# Start with a specialist — no setup required
octomind run developerThat's it. You're in an interactive session with a specialist that can read your code, run commands, edit files, and grow capabilities as needed.
| Tool | Purpose |
|---|---|
plan |
Structured multi-step task tracking |
mcp |
Enable/disable MCP servers at runtime |
agent |
Spawn specialist sub-agents mid-session |
schedule |
Inject messages at future times |
skill |
Inject reusable instruction packs from taps |
Filesystem tools (via octofs)
view, text_editor, batch_edit, shell, semantic_search, structural_search, workdir — file operations come from the companion octofs MCP server, included by default in tap formulas that need them.
Memory tools (via octomem)
remember, memorize, knowledge, graphrag — long-term memory and knowledge indexing, included by default in taps that need persistent context.
| Provider | Notes |
|---|---|
| OpenRouter | Many frontier models, one API key |
| OpenAI | GPT-4o, o3, Codex |
| Anthropic | Claude Opus, Sonnet, Haiku |
| Gemini 2.5 Pro/Flash | |
| Amazon Bedrock | Claude + Titan on AWS |
| Cloudflare | Workers AI |
| DeepSeek | V3, R1 |
Switch providers mid-session with /model deepseek:v3. Mix providers across roles — cheap model for research, best model for execution. Cost tracked separately per provider.
For most users, taps are enough. For teams and power users, the configuration system is deep — all TOML, no code.
# Per-role: independent model, temperature, MCP servers, tools, system prompt
[[roles]]
name = "senior-reviewer"
model = "anthropic:claude-opus-4"
temperature = 0.2
[roles.mcp]
server_refs = ["filesystem", "github"]
allowed_tools = ["view", "ast_grep", "create_pr"]
# Workflows — multi-step, each step its own model and toolset
[[workflows]]
name = "deep_review"
[[workflows.steps]]
name = "analyze"
layer = "context_researcher" # gemini-flash, broad context
[[workflows.steps]]
name = "critique"
layer = "senior_reviewer" # claude-opus, precision
# Sandbox — lock all writes to current directory
sandbox = true- Roles — model, temperature, system prompt, MCP servers, tool permissions per role.
- Layers — chained AI sub-agents that run after each response.
- Pipelines — deterministic script-driven pre-processing.
- Workflows — multi-step orchestrated task runners with validation loops.
See Configuration Reference for everything.
Octomind isn't just a CLI. It runs in every context an agent needs to live in:
| Mode | Use for |
|---|---|
| Interactive CLI | Daily work, any domain |
--format jsonl pipe |
CI/CD pipelines, shell scripts, automation |
--daemon + send |
Background agents, continuous monitoring, long-running tasks |
| WebSocket server | IDE plugins, web dashboards, external integrations |
| ACP protocol | Multi-agent orchestration, being called by other agents |
# ACP — drop into any multi-agent system as a sub-agent
octomind acp developer:general
# Non-interactive — single message, plain output
octomind run developer "Explain the auth module" --format plain
# Structured JSON output for pipelines
octomind run developer "List TODO items" --schema todos.json --format jsonlOne binary. Every workflow.
curl -fsSL https://raw.githubusercontent.com/muvon/octomind/master/install.sh | bashDetects OS and architecture, installs to ~/.local/bin/. macOS and Linux supported. Single Rust binary, no runtime dependencies.
cargo install octomindRequires Rust 1.82+. See Building from Source.
git clone https://github.com/muvon/octomind.git
cd octomind
cargo build --release# OpenRouter — recommended, access to many providers with one key
export OPENROUTER_API_KEY="your_key"
# Or any specific provider
export OPENAI_API_KEY="your_key"
export ANTHROPIC_API_KEY="your_key"
export DEEPSEEK_API_KEY="your_key"Add to ~/.bashrc or ~/.zshrc for persistence.
octomind --version
octomind config # generate default config
octomind run # start your first sessionConfig lives at ~/.local/share/octomind/config/config.toml.
octomind config --show # view current config
octomind config --validate # validate configKey areas:
- Roles — model, temperature, system prompt, MCP servers, tool permissions
- Workflows — multi-step AI processing with validation loops
- Pipelines — deterministic script-driven pre-processing
- MCP Servers — external tools and capabilities
- Spending Limits — per-request and per-session thresholds
Full reference: Configuration Reference.
| Command | Description |
|---|---|
/help |
Show all commands |
/info |
Token usage and costs |
/model <provider:model> |
Switch model mid-session |
/role <name> |
Switch role mid-session |
/save |
Save current session |
/exit |
Exit session |
Full list: Session Commands.
CLI / WebSocket / ACP / Daemon
|
v
ChatSession <- src/session/chat/session/
|
+-- Roles <- src/config/roles.rs
+-- Layers <- src/session/layers/
+-- Pipelines <- src/session/pipelines/
+-- Workflows <- src/session/workflows/
+-- Learning <- src/learning/
+-- Adaptive compaction <- src/mcp/core/plan/compression.rs
|
+-- MCP servers <- src/mcp/
+-- core/ plan, mcp, agent, schedule, skill
+-- (filesystem via external octofs)
+-- (memory via external octomem)
+-- agent/ agent_* tools route tasks to layers
Config is the single source of truth. All defaults live in config-templates/default.toml. The resolved config drives everything: which model, which MCP servers, which layers, which role. No hardcoded values in code.
See Architecture for internals.
The most impactful contribution isn't code — it's specialist agents.
Every domain expert who publishes a specialist makes Octomind useful for an entirely new audience. A cardiologist publishing doctor:medications. A tax attorney publishing lawyer:us. A security researcher publishing security:owasp. One TOML file — and everyone with that problem gets a specialist-grade AI instantly.
- Installation & Setup
- Quickstart
- Configuration
- Providers & Models
- Sessions & Compression
- Roles
- MCP Tools
- Workflows
- Pipelines
- Learning
- WebSocket & ACP
- CLI Reference
- Config Reference
Full index: doc/README.md.
Apache License 2.0 — see LICENSE.
Octomind by Muvon | Website | Documentation