- Drop-in in 2 lines
- Why we built this
- Why not nginx / Kong / Envoy?
- Installation
- Quick start
- LLM token budget in 30 seconds
- How a request flows
- Enforcement capabilities
- Performance
- Deployment
- CLI
- SaaS control plane (optional)
- Project layout
- License
# Before
client = OpenAI(base_url="https://api.openai.com/v1", api_key="sk-...")
# After — no other code changes
client = OpenAI(base_url="https://your-fairvisor/openai/v1", api_key="sk-...")Token limits, cost budgets, loop detection, and circuit breakers apply transparently. Works with OpenAI-compatible endpoints: OpenAI, Azure OpenAI, Anthropic-compatible gateways, Gemini-compatible gateways, vLLM, Ollama, LiteLLM.
When multiple tenants, agents, or services share an API, one misbehaving caller can exhaust the budget for everyone — whether that's LLM tokens, API credits, or request quotas. Fairvisor is a lightweight enforcement engine that gives each tenant isolated limits at the edge: token budgets, cost caps, rate limits, and kill switches — keyed on JWT claims, API keys, or IP. One container, one JSON policy file, no Redis.
API gateways count requests. LLM providers bill by the token.
When you serve multiple tenants — customers, teams, or agentic pipelines — that gap becomes a real problem. One runaway agent can consume a month's token budget overnight. Your gateway sees one request per second; your invoice shows 3 million tokens.
We needed something that:
- Understood token budgets, not just request counts
- Could key limits on JWT claims (
org_id,plan,user_id), not just IPs - Kept every request fast — no Redis round-trip, no extra network call in the hot path
- Could plug into nginx or Envoy or run standalone as a transparent LLM proxy
- Plugged in without rewriting application code: change the base URL in the client — enforcement applies transparently to existing LLM calls
We couldn't find it, so we built Fairvisor.
If you have an existing gateway, the question is whether Fairvisor adds anything you can't get from the plugin ecosystem already installed. Here is the honest comparison:
| Concern | nginx limit_req |
Kong rate-limiting | Envoy global rate limit | Fairvisor Edge |
|---|---|---|---|---|
| Per-tenant limits (JWT claim) | No — IP/zone only | Partial — custom plugin | Yes, via descriptors | Yes — jwt:org_id, jwt:plan, any claim |
| LLM token budgets (TPM/TPD) | No | No | No | Yes — pre-request reservation + post-response refund |
| Cost budgets (cumulative $) | No | No | No | Yes |
| Distributed state requirement | No (per-process) | Redis or Postgres | Separate rate limit service | No — in-process ngx.shared.dict |
| Network round-trip in hot path | No | Yes (to Redis) | Yes (to rate limit service) | No |
| Policy as versioned JSON | No | No (Admin API state) | Partial (Envoy config) | Yes — commit, diff, roll back |
| Kill switches (instant, no restart) | No | No | No | Yes |
| Loop detection for agents | No | No | No | Yes |
If nginx limit_req is enough for you, use it. It has zero overhead and is the right tool for simple per-IP global throttling. Fairvisor becomes relevant when you need per-tenant awareness, JWT-claim-based bucketing, or cost/token tracking that limit_req has no model for.
If you are already running Kong, the built-in rate limiting plugin stores counters in Redis or Postgres — every decision is a network call. Fairvisor can run alongside Kong as an auth_request decision service with no external state. See Kong / Traefik integration →
If you are running Envoy, the global rate limit service requires deploying a separate Redis-backed service with its own config language. Fairvisor is one container, one JSON file, and integrates via ext_authz in the same position. See Envoy ext_authz integration →
If you are on Cloudflare or Akamai, per-JWT-claim limits, LLM token budgets, and cost caps are not in the platform's model. If your limits are tenant-aware or cost-aware, you need something that runs in your own stack.
Fairvisor can run alongside Kong, nginx, and Envoy — or as a standalone reverse proxy if you don't need a separate gateway. See nginx auth_request → · Envoy ext_authz → · Kong / Traefik → for integration patterns.
docker run -d \
-p 8080:8080 \
-e FAIRVISOR_CONFIG_FILE=/etc/fairvisor/policy.json \
-v "$PWD/policy.json:/etc/fairvisor/policy.json:ro" \
ghcr.io/fairvisor/fairvisor-edge:latest- Add the Fairvisor repository:
curl -sS https://fairvisor.github.io/apt/pubkey.gpg | sudo gpg --dearmor -o /usr/share/keyrings/fairvisor-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/fairvisor-archive-keyring.gpg] https://fairvisor.github.io/apt \$(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/fairvisor.list- Install:
sudo apt update
sudo apt install fairvisor- Verify:
fairvisor version
systemctl status fairvisorWhich mode is right for you?
- Reverse proxy — you have an LLM or upstream service → Fairvisor sits in front, enforces budgets, and proxies the request. Set
FAIRVISOR_LLM_FORMATto enable token counting and streaming cutoff. Fastest to try.- Decision service — you already run nginx, Envoy, or Kong → call
POST /v1/decisionfromauth_request/ext_authz.
git clone https://github.com/fairvisor/edge.git
cd edge/examples/quickstart
docker compose up -dRun your first enforce/reject test in under a minute — full walkthrough in examples/quickstart/README.md.
Recipes: examples/recipes/ — team budgets, runaway agent guard, circuit-breaker.
Sample artifacts: fixtures/ — canonical enforce/reject fixtures (OpenAI, Anthropic, Gemini).
Expand — manual setup with a single docker run
1. Create a policy
mkdir fairvisor-demo && cd fairvisor-demopolicy.json:
{
"bundle_version": 1,
"issued_at": "2026-01-01T00:00:00Z",
"policies": [
{
"id": "demo-rate-limit",
"spec": {
"selector": { "pathPrefix": "/", "methods": ["GET", "POST"] },
"mode": "enforce",
"rules": [
{
"name": "global-rps",
"limit_keys": ["ip:address"],
"algorithm": "token_bucket",
"algorithm_config": { "tokens_per_second": 5, "burst": 10 }
}
]
}
}
],
"kill_switches": []
}2. Run the edge
docker run -d \
--name fairvisor \
-p 8080:8080 \
-v "$(pwd)/policy.json:/etc/fairvisor/policy.json:ro" \
-e FAIRVISOR_CONFIG_FILE=/etc/fairvisor/policy.json \
-e FAIRVISOR_MODE=decision_service \
ghcr.io/fairvisor/fairvisor-edge:latest3. Verify
curl -sf http://localhost:8080/readyz
# {"status":"ok"}
# Allowed request → HTTP 200
curl -s -o /dev/null -w "HTTP %{http_code}\n" \
-H "X-Original-Method: GET" \
-H "X-Original-URI: /api/data" \
-H "X-Forwarded-For: 10.0.0.1" \
http://localhost:8080/v1/decision
# Rejected request — exhaust the burst (>10 requests)
for i in $(seq 1 12); do
curl -s -o /dev/null -w "HTTP %{http_code}\n" \
-H "X-Original-Method: GET" \
-H "X-Original-URI: /api/data" \
-H "X-Forwarded-For: 10.0.0.1" \
http://localhost:8080/v1/decision
done
# last requests → HTTP 429 X-Fairvisor-Reason: rate_limit_exceededFull walkthrough: docs.fairvisor.com/docs/quickstart
Fairvisor proxies traffic to your LLM provider and enforces token budgets. Set FAIRVISOR_MODE=reverse_proxy, point FAIRVISOR_BACKEND_URL at your LLM, and set FAIRVISOR_LLM_FORMAT so Fairvisor knows how to parse streaming responses.
0. Point your client at Fairvisor (the only app-level change):
client = OpenAI(
base_url="https://your-fairvisor-host/openai/v1",
api_key="sk-proj-..."
)1. Run the edge — pointing at OpenAI:
docker run -d \
-e FAIRVISOR_MODE=reverse_proxy \
-e FAIRVISOR_BACKEND_URL=https://api.openai.com \
-e FAIRVISOR_LLM_FORMAT=openai \
-e FAIRVISOR_STRIP_REQUEST_HEADERS="Authorization" \
-e FAIRVISOR_UPSTREAM_HEADER_Authorization="Bearer sk-proj-..." \
-e FAIRVISOR_CONFIG_FILE=/etc/fairvisor/policy.json \
-v "$PWD/policy.json:/etc/fairvisor/policy.json:ro" \
-p 8080:8080 \
ghcr.io/fairvisor/fairvisor-edge:latestHeader handling — two env var families control what reaches the upstream:
FAIRVISOR_STRIP_REQUEST_HEADERS(comma-separated) — headers to remove from the client request before proxying. Stripping happens after rate-limit keying, so Fairvisor can still read and key on the header value (e.g.jwt:org_idfrom the client'sAuthorizationJWT) before it is removed.FAIRVISOR_UPSTREAM_HEADER_<Name>— headers to inject when forwarding. The suffix becomes the header name with_→-(e.g.FAIRVISOR_UPSTREAM_HEADER_Authorization→Authorization,FAIRVISOR_UPSTREAM_HEADER_X_Api_Key→X-Api-Key). Injection happens after stripping.
The order of operations on every request:
- Read client headers → enforce rate limits / token budgets
- Strip headers listed in
FAIRVISOR_STRIP_REQUEST_HEADERS - Inject headers from
FAIRVISOR_UPSTREAM_HEADER_* - Forward to upstream
This means the client's JWT is used for keying but never reaches the upstream, and the upstream key is never visible to the client.
2. Policy — one rule, per-org TPM + daily cap:
{
"id": "llm-budget",
"spec": {
"selector": { "pathPrefix": "/" },
"mode": "enforce",
"rules": [
{
"name": "per-org-tpm",
"limit_keys": ["jwt:org_id"],
"algorithm": "token_bucket_llm",
"algorithm_config": {
"tokens_per_minute": 60000,
"tokens_per_day": 1200000,
"default_max_completion": 800
}
}
]
}
}3. Call the API — client sends only their JWT:
curl https://your-fairvisor-host/v1/chat/completions \
-H "Authorization: Bearer eyJhbGc..." \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'Fairvisor parses the JWT claims (no signature validation — the JWT is trusted as-is), extracts org_id, charges tokens against the budget, injects the upstream headers configured via FAIRVISOR_UPSTREAM_HEADER_*, and forwards the request. The upstream key is never visible to the client.
When the budget is exhausted:
HTTP/1.1 429 Too Many Requests
X-Fairvisor-Reason: tpm_exceeded
Retry-After: 12
RateLimit-Limit: 60000
RateLimit-Remaining: 0Each organization gets its own independent 60k TPM / 1.2M TPD budget. Set FAIRVISOR_LLM_FORMAT=anthropic or FAIRVISOR_LLM_FORMAT=gemini for those providers.
Decision service mode: if you already have a gateway, use
selector: { "pathPrefix": "/v1/chat" }and callPOST /v1/decisionfrom your existingauth_requestorext_authzhook instead.
Decision service mode — Fairvisor runs as a sidecar. Your existing gateway calls /v1/decision via auth_request (nginx) or ext_authz (Envoy) and handles forwarding itself.
Reverse proxy mode — Fairvisor sits inline. Traffic arrives at Fairvisor directly, gets evaluated, and is proxied to the upstream if allowed. Set FAIRVISOR_LLM_FORMAT to enable LLM-aware token counting and streaming cutoff for OpenAI, Anthropic, or Gemini upstreams.
Both modes use the same policy bundle and return the same rejection headers.
When a request is rejected:
HTTP/1.1 429 Too Many Requests
X-Fairvisor-Reason: tpm_exceeded
Retry-After: 12
RateLimit: "llm-default";r=0;t=12
RateLimit-Limit: 120000
RateLimit-Remaining: 0
RateLimit-Reset: 12Headers follow RFC 9333 RateLimit Fields. X-Fairvisor-Reason gives clients a machine-readable code for retry logic and observability.
Decision service mode — sidecar: your gateway calls /v1/decision, handles forwarding itself.
sequenceDiagram
participant C as Client
participant G as Your Gateway<br/>(nginx / Envoy / Kong)
participant F as Fairvisor Edge<br/>decision_service
participant U as Upstream service
C->>G: Request
G->>F: POST /v1/decision<br/>(auth_request / ext_authz)
alt allow
F-->>G: 204 No Content
G->>U: Forward request
U-->>G: Response
G-->>C: Response
else reject
F-->>G: 429 + RateLimit headers
G-->>C: 429 Too Many Requests
end
Reverse proxy mode — inline: Fairvisor handles both enforcement and proxying.
sequenceDiagram
participant C as Client
participant F as Fairvisor Edge<br/>reverse_proxy
participant U as Upstream service
C->>F: Request
alt allow
F->>U: Forward request
U-->>F: Response
F-->>C: Response
else reject
F-->>C: 429 + RFC 9333 headers
end
Reverse proxy mode with LLM format — inline LLM proxy with token budget enforcement.
sequenceDiagram
participant C as Client
participant F as Fairvisor Edge<br/>reverse_proxy + FAIRVISOR_LLM_FORMAT
participant U as Upstream LLM<br/>(OpenAI / Anthropic / Gemini)
C->>F: POST /v1/chat/completions<br/>Authorization: Bearer CLIENT_JWT
F->>F: 1. Parse JWT claims (org_id, user_id)
F->>F: 2. Enforce TPM / TPD / cost budget
alt budget ok
F->>F: 3. Strip FAIRVISOR_STRIP_REQUEST_HEADERS · inject FAIRVISOR_UPSTREAM_HEADER_*
F->>U: POST /v1/chat/completions<br/>Authorization: Bearer UPSTREAM_KEY
U-->>F: 200 OK + token usage
F->>F: 4. Count tokens · refund unused reservation
F-->>C: 200 OK
else budget exceeded
F-->>C: 429 X-Fairvisor-Reason: tpm_exceeded
end
Both modes use the same policy bundle and produce the same rejection headers.
| If you need to… | Algorithm | Typical identity keys | Reject reason |
|---|---|---|---|
| Cap request frequency | token_bucket |
jwt:user_id, header:x-api-key, ip:address |
rate_limit_exceeded |
| Cap cumulative spend | cost_based |
jwt:org_id, jwt:plan |
budget_exhausted |
| Cap LLM tokens (TPM/TPD) | token_bucket_llm |
jwt:org_id, jwt:user_id |
tpm_exceeded, tpd_exceeded |
| Instantly block a segment | kill switch | any descriptor | kill_switch_active |
| Dry-run before enforcing | shadow mode | any descriptor | allow + would_reject telemetry |
| Stop runaway agent loops | loop detection | request fingerprint | loop_detected |
| Clamp spend spikes | circuit breaker | global or policy scope | circuit_breaker_open |
Identity keys can be JWT claims (jwt:org_id, jwt:plan), HTTP headers (header:x-api-key), or IP attributes (ip:address, ip:country). Combine multiple keys per rule for compound matching.
| Percentile | Decision service | Reverse proxy | Raw nginx (baseline) |
|---|---|---|---|
| p50 | 304 μs | 302 μs | 235 μs |
| p90 | 543 μs | 593 μs | 409 μs |
| p99 | 2.00 ms | 1.79 ms | 1.95 ms |
| p99.9 | 4.00 ms | 5.12 ms | 3.62 ms |
Enforcement overhead over raw nginx baseline: p50 +69 µs / p90 +134 µs.
| Configuration | Max RPS |
|---|---|
| Simple rate limit (1 rule) | 195,000 |
| Complex policy (5 rules, JWT parsing, loop detection) | 195,000 |
Reproduce: see fairvisor/benchmark — the canonical benchmark source of truth for Fairvisor Edge performance numbers.
| Target | Guide |
|---|---|
| Docker (local/VM) | docs/guides/docker |
| Kubernetes (Helm) | docs/guides/helm |
| LiteLLM integration | docs/guides/litellm |
nginx auth_request |
docs/gateway/nginx |
Envoy ext_authz |
docs/gateway/envoy |
| Kong / Traefik | docs/gateway |
Fairvisor works alongside Kong, nginx, Envoy, and Traefik — or runs standalone as a reverse proxy when you don't need a separate gateway.
fairvisor init --template=api # scaffold a policy bundle
fairvisor validate policy.json # validate before deploying
fairvisor test --dry-run # shadow-mode replay
fairvisor status # edge health and loaded bundle info
fairvisor logs # tail rejection eventsThe edge is open source and runs standalone. The SaaS adds:
- Policy editor with validation and diff view
- Fleet management and policy push
- Analytics: top limited routes, tenants, abusive sources
- Audit log exports for SOC 2 workflows
- Alerts (Datadog, Sentry, PagerDuty, Prometheus)
- RBAC and SSO (Enterprise)
If the SaaS is unreachable, the edge keeps enforcing with the last-known policy bundle. No degradation.
src/fairvisor/ runtime modules (OpenResty/LuaJIT)
cli/ command-line tooling
spec/ unit and integration tests (busted)
tests/e2e/ Docker-based E2E tests (pytest)
examples/quickstart/ runnable quickstart (docker compose up -d)
examples/recipes/ deployable policy recipes (team budgets, agent guard, circuit breaker)
fixtures/ canonical request/response sample artifacts
helm/ Helm chart
docker/ Docker artifacts
docs/ reference documentation
Docs: docs.fairvisor.com · Website: fairvisor.com · Quickstart: 5 minutes to enforcement