See where your AI tokens actually go.
tokmon is a transparent local proxy between coding agents (Claude Code, Cursor, aider) and LLM APIs (Anthropic, OpenAI). It logs token usage and estimated cost, detects anomaly patterns (cache breaks, retry loops, token spikes), and shows a real-time terminal dashboard.
Zero config. Local-first. No prompt or response content stored.
Developers are hitting token limits and cost spikes without visibility into where usage comes from. The HN thread that triggered this project: "Claude Code users hitting usage limits way faster than expected". Real pain points from that discussion: silent cache-break inflation, retry loops burning quota, and "token anxiety" from opaque usage behavior.
brew install tokmon or go install github.com/evgenybalyakin/tokmon@latest or download a release binary.
# Start proxy
tokmon
# Run agent through tokmon
ANTHROPIC_BASE_URL=http://localhost:4100/anthropic claude
# See live telemetry
tokmon dashtokmon dashboard updates every 500ms with:
- total requests/tokens/cache/cost/error rate
- last 10 requests with cache ratio and per-request cost
- cache break, retry loop, token spike warnings
- Transparent local proxy for Anthropic/OpenAI with streaming support.
- OpenAI streaming compatibility patch (
stream_options.include_usage=trueonly when missing). - SQLite telemetry in WAL mode with async, bounded write queue.
- Session-aware request fingerprinting with prompt text stripped.
- Real-time terminal dashboard built with Bubble Tea.
statscommand with plain text and JSON outputs.- CI guardrails via
stats --assert-*(budget, error rate, retry-loop checks). - Export telemetry to
json,jsonl, orcsv. - Retention pruning with dry-run by default.
- Setup helper for shell env var bootstrap.
┌─────────────┐ ┌─────────┐ ┌──────────────────┐
│ Claude Code │────▶│ tokmon │────▶│ api.anthropic.com │
│ / Cursor / │◀────│ (proxy) │◀────│ / api.openai.com │
│ aider / etc │ └────┬────┘ └──────────────────┘
└─────────────┘ │
┌────▼────┐
│ SQLite │
│ (logs) │
└─────────┘
tokmon listens on localhost:<port> and routes:
/anthropic/*->https://api.anthropic.com/*/openai/*->https://api.openai.com/*
| Variable | Meaning | Default |
|---|---|---|
TOKMON_DB |
SQLite database file | ~/.tokmon/tokmon.db |
TOKMON_PORT |
proxy listen port | 4100 |
TOKMON_SESSION |
explicit session id | auto-generated |
TOKMON_TZ |
timezone for stats day/week windows |
local timezone |
| Flag | Meaning | Default |
|---|---|---|
--port |
proxy listen port | 4100 |
--budget |
budget USD limit | 0 (disabled) |
--budget-scope |
session or day |
session |
--budget-action |
warn, pause, stop |
warn |
--db |
sqlite path override | env/default |
# Fail pipeline if known estimated cost exceeds $5
tokmon stats --json --assert-budget 5
# Fail if error rate > 1.0%
tokmon stats --assert-error-rate 1.0
# Fail if retry loops are detected
tokmon stats --assert-no-retry-loopsYes, because it forwards headers to the upstream provider. tokmon does not persist API keys and does not log prompt/response bodies.
It forwards streams chunk-by-chunk (tee pattern) and writes telemetry asynchronously, so proxying overhead is minimal.
Yes. Usage logging is independent of your billing method. Cost estimates are still useful as a normalized comparison metric.
Supported. Set OPENAI_BASE_URL=http://localhost:4100/openai.
OpenAI omits usage in streams unless stream_options.include_usage=true. tokmon injects that field only when stream: true and include_usage is missing.
Yes. Start proxy in background, run agent with *_BASE_URL override, then collect tokmon stats --json.
Contributions are welcome. Please open an issue first for major design changes.
go test ./...MIT licensed.
