Skip to content

balyakin/tokmon

Repository files navigation

tokmon

See where your AI tokens actually go.

License: MIT Go Report Card

tokmon is a transparent local proxy between coding agents (Claude Code, Cursor, aider) and LLM APIs (Anthropic, OpenAI). It logs token usage and estimated cost, detects anomaly patterns (cache breaks, retry loops, token spikes), and shows a real-time terminal dashboard.

Zero config. Local-first. No prompt or response content stored.

demo

The Problem

Developers are hitting token limits and cost spikes without visibility into where usage comes from. The HN thread that triggered this project: "Claude Code users hitting usage limits way faster than expected". Real pain points from that discussion: silent cache-break inflation, retry loops burning quota, and "token anxiety" from opaque usage behavior.

Install

brew install tokmon or go install github.com/evgenybalyakin/tokmon@latest or download a release binary.

Quickstart

# Start proxy
tokmon

# Run agent through tokmon
ANTHROPIC_BASE_URL=http://localhost:4100/anthropic claude

# See live telemetry
tokmon dash

What You'll See

tokmon dashboard updates every 500ms with:

  • total requests/tokens/cache/cost/error rate
  • last 10 requests with cache ratio and per-request cost
  • cache break, retry loop, token spike warnings

Features

  • Transparent local proxy for Anthropic/OpenAI with streaming support.
  • OpenAI streaming compatibility patch (stream_options.include_usage=true only when missing).
  • SQLite telemetry in WAL mode with async, bounded write queue.
  • Session-aware request fingerprinting with prompt text stripped.
  • Real-time terminal dashboard built with Bubble Tea.
  • stats command with plain text and JSON outputs.
  • CI guardrails via stats --assert-* (budget, error rate, retry-loop checks).
  • Export telemetry to json, jsonl, or csv.
  • Retention pruning with dry-run by default.
  • Setup helper for shell env var bootstrap.

How It Works

┌─────────────┐     ┌─────────┐     ┌──────────────────┐
│ Claude Code │────▶│ tokmon  │────▶│ api.anthropic.com │
│ / Cursor /  │◀────│ (proxy) │◀────│ / api.openai.com  │
│ aider / etc │     └────┬────┘     └──────────────────┘
└─────────────┘          │
                    ┌────▼────┐
                    │ SQLite  │
                    │ (logs)  │
                    └─────────┘

tokmon listens on localhost:<port> and routes:

  • /anthropic/* -> https://api.anthropic.com/*
  • /openai/* -> https://api.openai.com/*

Configuration

Environment variables

Variable Meaning Default
TOKMON_DB SQLite database file ~/.tokmon/tokmon.db
TOKMON_PORT proxy listen port 4100
TOKMON_SESSION explicit session id auto-generated
TOKMON_TZ timezone for stats day/week windows local timezone

Proxy flags

Flag Meaning Default
--port proxy listen port 4100
--budget budget USD limit 0 (disabled)
--budget-scope session or day session
--budget-action warn, pause, stop warn
--db sqlite path override env/default

Automation Guardrails

# Fail pipeline if known estimated cost exceeds $5
tokmon stats --json --assert-budget 5

# Fail if error rate > 1.0%
tokmon stats --assert-error-rate 1.0

# Fail if retry loops are detected
tokmon stats --assert-no-retry-loops

FAQ

Does tokmon see my API key?

Yes, because it forwards headers to the upstream provider. tokmon does not persist API keys and does not log prompt/response bodies.

Does tokmon slow down requests?

It forwards streams chunk-by-chunk (tee pattern) and writes telemetry asynchronously, so proxying overhead is minimal.

Does tokmon work with Claude Pro/Max subscriptions?

Yes. Usage logging is independent of your billing method. Cost estimates are still useful as a normalized comparison metric.

What about OpenAI / GPT?

Supported. Set OPENAI_BASE_URL=http://localhost:4100/openai.

Why does tokmon modify OpenAI streaming requests?

OpenAI omits usage in streams unless stream_options.include_usage=true. tokmon injects that field only when stream: true and include_usage is missing.

Can I run this in CI/CD?

Yes. Start proxy in background, run agent with *_BASE_URL override, then collect tokmon stats --json.

Contributing

Contributions are welcome. Please open an issue first for major design changes.

go test ./...

MIT licensed.

Releases

No releases published

Packages

 
 
 

Contributors