FAIRVISOR

Policy and spend control at the edge for LLM

Drop-in in 2 lines

# Before
client = OpenAI(base_url="https://api.openai.com/v1", api_key="sk-...")

# After — no other code changes
client = OpenAI(base_url="https://your-fairvisor/openai/v1", api_key="sk-...")

Token limits, cost budgets, loop detection, and circuit breakers apply transparently. Works with OpenAI-compatible endpoints: OpenAI, Azure OpenAI, Anthropic-compatible gateways, Gemini-compatible gateways, vLLM, Ollama, LiteLLM.

When multiple tenants, agents, or services share an API, one misbehaving caller can exhaust the budget for everyone — whether that's LLM tokens, API credits, or request quotas. Fairvisor is a lightweight enforcement engine that gives each tenant isolated limits at the edge: token budgets, cost caps, rate limits, and kill switches — keyed on JWT claims, API keys, or IP. One container, one JSON policy file, no Redis.

Why we built this

API gateways count requests. LLM providers bill by the token.

When you serve multiple tenants — customers, teams, or agentic pipelines — that gap becomes a real problem. One runaway agent can consume a month's token budget overnight. Your gateway sees one request per second; your invoice shows 3 million tokens.

We needed something that:

Understood token budgets, not just request counts
Could key limits on JWT claims (org_id, plan, user_id), not just IPs
Kept every request fast — no Redis round-trip, no extra network call in the hot path
Could plug into nginx or Envoy or run standalone as a transparent LLM proxy
Plugged in without rewriting application code: change the base URL in the client — enforcement applies transparently to existing LLM calls

We couldn't find it, so we built Fairvisor.

Why not nginx / Kong / Envoy?

If you have an existing gateway, the question is whether Fairvisor adds anything you can't get from the plugin ecosystem already installed. Here is the honest comparison:

Concern	nginx `limit_req`	Kong rate-limiting	Envoy global rate limit	Fairvisor Edge
Per-tenant limits (JWT claim)	No — IP/zone only	Partial — custom plugin	Yes, via descriptors	Yes — `jwt:org_id`, `jwt:plan`, any claim
LLM token budgets (TPM/TPD)	No	No	No	Yes — pre-request reservation + post-response refund
Cost budgets (cumulative $)	No	No	No	Yes
Distributed state requirement	No (per-process)	Redis or Postgres	Separate rate limit service	No — in-process `ngx.shared.dict`
Network round-trip in hot path	No	Yes (to Redis)	Yes (to rate limit service)	No
Policy as versioned JSON	No	No (Admin API state)	Partial (Envoy config)	Yes — commit, diff, roll back
Kill switches (instant, no restart)	No	No	No	Yes
Loop detection for agents	No	No	No	Yes

If nginx limit_req is enough for you, use it. It has zero overhead and is the right tool for simple per-IP global throttling. Fairvisor becomes relevant when you need per-tenant awareness, JWT-claim-based bucketing, or cost/token tracking that limit_req has no model for.

If you are already running Kong, the built-in rate limiting plugin stores counters in Redis or Postgres — every decision is a network call. Fairvisor can run alongside Kong as an auth_request decision service with no external state. See Kong / Traefik integration →

If you are running Envoy, the global rate limit service requires deploying a separate Redis-backed service with its own config language. Fairvisor is one container, one JSON file, and integrates via ext_authz in the same position. See Envoy ext_authz integration →

If you are on Cloudflare or Akamai, per-JWT-claim limits, LLM token budgets, and cost caps are not in the platform's model. If your limits are tenant-aware or cost-aware, you need something that runs in your own stack.

Fairvisor can run alongside Kong, nginx, and Envoy — or as a standalone reverse proxy if you don't need a separate gateway. See nginx auth_request → · Envoy ext_authz → · Kong / Traefik → for integration patterns.

Installation

Docker

docker run -d \
  -p 8080:8080 \
  -e FAIRVISOR_CONFIG_FILE=/etc/fairvisor/policy.json \
  -v "$PWD/policy.json:/etc/fairvisor/policy.json:ro" \
  ghcr.io/fairvisor/fairvisor-edge:latest

apt (Ubuntu / Debian)

Add the Fairvisor repository:

curl -sS https://fairvisor.github.io/apt/pubkey.gpg | sudo gpg --dearmor -o /usr/share/keyrings/fairvisor-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/fairvisor-archive-keyring.gpg] https://fairvisor.github.io/apt \$(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/fairvisor.list

Install:

sudo apt update
sudo apt install fairvisor

Verify:

fairvisor version
systemctl status fairvisor

Quick start

Which mode is right for you?

Reverse proxy — you have an LLM or upstream service → Fairvisor sits in front, enforces budgets, and proxies the request. Set FAIRVISOR_LLM_FORMAT to enable token counting and streaming cutoff. Fastest to try.

Decision service — you already run nginx, Envoy, or Kong → call POST /v1/decision from auth_request / ext_authz.

Fastest path

git clone https://github.com/fairvisor/edge.git
cd edge/examples/quickstart
docker compose up -d

Run your first enforce/reject test in under a minute — full walkthrough in examples/quickstart/README.md.

Recipes: examples/recipes/ — team budgets, runaway agent guard, circuit-breaker.

Sample artifacts: fixtures/ — canonical enforce/reject fixtures (OpenAI, Anthropic, Gemini).

Minimal decision_service example

Expand — manual setup with a single docker run

1. Create a policy

mkdir fairvisor-demo && cd fairvisor-demo

policy.json:

{
  "bundle_version": 1,
  "issued_at": "2026-01-01T00:00:00Z",
  "policies": [
    {
      "id": "demo-rate-limit",
      "spec": {
        "selector": { "pathPrefix": "/", "methods": ["GET", "POST"] },
        "mode": "enforce",
        "rules": [
          {
            "name": "global-rps",
            "limit_keys": ["ip:address"],
            "algorithm": "token_bucket",
            "algorithm_config": { "tokens_per_second": 5, "burst": 10 }
          }
        ]
      }
    }
  ],
  "kill_switches": []
}

2. Run the edge

docker run -d \
  --name fairvisor \
  -p 8080:8080 \
  -v "$(pwd)/policy.json:/etc/fairvisor/policy.json:ro" \
  -e FAIRVISOR_CONFIG_FILE=/etc/fairvisor/policy.json \
  -e FAIRVISOR_MODE=decision_service \
  ghcr.io/fairvisor/fairvisor-edge:latest

3. Verify

curl -sf http://localhost:8080/readyz
# {"status":"ok"}

# Allowed request → HTTP 200
curl -s -o /dev/null -w "HTTP %{http_code}\n" \
  -H "X-Original-Method: GET" \
  -H "X-Original-URI: /api/data" \
  -H "X-Forwarded-For: 10.0.0.1" \
  http://localhost:8080/v1/decision

# Rejected request — exhaust the burst (>10 requests)
for i in $(seq 1 12); do
  curl -s -o /dev/null -w "HTTP %{http_code}\n" \
    -H "X-Original-Method: GET" \
    -H "X-Original-URI: /api/data" \
    -H "X-Forwarded-For: 10.0.0.1" \
    http://localhost:8080/v1/decision
done
# last requests → HTTP 429  X-Fairvisor-Reason: rate_limit_exceeded

Full walkthrough: docs.fairvisor.com/docs/quickstart

LLM token budget in 30 seconds

Fairvisor proxies traffic to your LLM provider and enforces token budgets. Set FAIRVISOR_MODE=reverse_proxy, point FAIRVISOR_BACKEND_URL at your LLM, and set FAIRVISOR_LLM_FORMAT so Fairvisor knows how to parse streaming responses.

0. Point your client at Fairvisor (the only app-level change):

client = OpenAI(
    base_url="https://your-fairvisor-host/openai/v1",
    api_key="sk-proj-..."
)

1. Run the edge — pointing at OpenAI:

docker run -d \
  -e FAIRVISOR_MODE=reverse_proxy \
  -e FAIRVISOR_BACKEND_URL=https://api.openai.com \
  -e FAIRVISOR_LLM_FORMAT=openai \
  -e FAIRVISOR_STRIP_REQUEST_HEADERS="Authorization" \
  -e FAIRVISOR_UPSTREAM_HEADER_Authorization="Bearer sk-proj-..." \
  -e FAIRVISOR_CONFIG_FILE=/etc/fairvisor/policy.json \
  -v "$PWD/policy.json:/etc/fairvisor/policy.json:ro" \
  -p 8080:8080 \
  ghcr.io/fairvisor/fairvisor-edge:latest

Header handling — two env var families control what reaches the upstream:

FAIRVISOR_STRIP_REQUEST_HEADERS (comma-separated) — headers to remove from the client request before proxying. Stripping happens after rate-limit keying, so Fairvisor can still read and key on the header value (e.g. jwt:org_id from the client's Authorization JWT) before it is removed.
FAIRVISOR_UPSTREAM_HEADER_<Name> — headers to inject when forwarding. The suffix becomes the header name with _ → - (e.g. FAIRVISOR_UPSTREAM_HEADER_Authorization → Authorization, FAIRVISOR_UPSTREAM_HEADER_X_Api_Key → X-Api-Key). Injection happens after stripping.

The order of operations on every request:

Read client headers → enforce rate limits / token budgets
Strip headers listed in FAIRVISOR_STRIP_REQUEST_HEADERS
Inject headers from FAIRVISOR_UPSTREAM_HEADER_*
Forward to upstream

This means the client's JWT is used for keying but never reaches the upstream, and the upstream key is never visible to the client.

2. Policy — one rule, per-org TPM + daily cap:

{
  "id": "llm-budget",
  "spec": {
    "selector": { "pathPrefix": "/" },
    "mode": "enforce",
    "rules": [
      {
        "name": "per-org-tpm",
        "limit_keys": ["jwt:org_id"],
        "algorithm": "token_bucket_llm",
        "algorithm_config": {
          "tokens_per_minute": 60000,
          "tokens_per_day": 1200000,
          "default_max_completion": 800
        }
      }
    ]
  }
}

3. Call the API — client sends only their JWT:

curl https://your-fairvisor-host/v1/chat/completions \
  -H "Authorization: Bearer eyJhbGc..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'

Fairvisor parses the JWT claims (no signature validation — the JWT is trusted as-is), extracts org_id, charges tokens against the budget, injects the upstream headers configured via FAIRVISOR_UPSTREAM_HEADER_*, and forwards the request. The upstream key is never visible to the client.

When the budget is exhausted:

HTTP/1.1 429 Too Many Requests
X-Fairvisor-Reason: tpm_exceeded
Retry-After: 12
RateLimit-Limit: 60000
RateLimit-Remaining: 0

Each organization gets its own independent 60k TPM / 1.2M TPD budget. Set FAIRVISOR_LLM_FORMAT=anthropic or FAIRVISOR_LLM_FORMAT=gemini for those providers.

Decision service mode: if you already have a gateway, use selector: { "pathPrefix": "/v1/chat" } and call POST /v1/decision from your existing auth_request or ext_authz hook instead.

How a request flows

Decision service mode — Fairvisor runs as a sidecar. Your existing gateway calls /v1/decision via auth_request (nginx) or ext_authz (Envoy) and handles forwarding itself.

Reverse proxy mode — Fairvisor sits inline. Traffic arrives at Fairvisor directly, gets evaluated, and is proxied to the upstream if allowed. Set FAIRVISOR_LLM_FORMAT to enable LLM-aware token counting and streaming cutoff for OpenAI, Anthropic, or Gemini upstreams.

Both modes use the same policy bundle and return the same rejection headers.

When a request is rejected:

HTTP/1.1 429 Too Many Requests
X-Fairvisor-Reason: tpm_exceeded
Retry-After: 12
RateLimit: "llm-default";r=0;t=12
RateLimit-Limit: 120000
RateLimit-Remaining: 0
RateLimit-Reset: 12

Headers follow RFC 9333 RateLimit Fields. X-Fairvisor-Reason gives clients a machine-readable code for retry logic and observability.

Architecture

Decision service mode — sidecar: your gateway calls /v1/decision, handles forwarding itself.

sequenceDiagram
    participant C as Client
    participant G as Your Gateway<br/>(nginx / Envoy / Kong)
    participant F as Fairvisor Edge<br/>decision_service
    participant U as Upstream service

    C->>G: Request
    G->>F: POST /v1/decision<br/>(auth_request / ext_authz)
    alt allow
        F-->>G: 204 No Content
        G->>U: Forward request
        U-->>G: Response
        G-->>C: Response
    else reject
        F-->>G: 429 + RateLimit headers
        G-->>C: 429 Too Many Requests
    end

Reverse proxy mode — inline: Fairvisor handles both enforcement and proxying.

sequenceDiagram
    participant C as Client
    participant F as Fairvisor Edge<br/>reverse_proxy
    participant U as Upstream service

    C->>F: Request
    alt allow
        F->>U: Forward request
        U-->>F: Response
        F-->>C: Response
    else reject
        F-->>C: 429 + RFC 9333 headers
    end

Reverse proxy mode with LLM format — inline LLM proxy with token budget enforcement.

sequenceDiagram
    participant C as Client
    participant F as Fairvisor Edge<br/>reverse_proxy + FAIRVISOR_LLM_FORMAT
    participant U as Upstream LLM<br/>(OpenAI / Anthropic / Gemini)

    C->>F: POST /v1/chat/completions<br/>Authorization: Bearer CLIENT_JWT
    F->>F: 1. Parse JWT claims (org_id, user_id)
    F->>F: 2. Enforce TPM / TPD / cost budget
    alt budget ok
        F->>F: 3. Strip FAIRVISOR_STRIP_REQUEST_HEADERS · inject FAIRVISOR_UPSTREAM_HEADER_*
        F->>U: POST /v1/chat/completions<br/>Authorization: Bearer UPSTREAM_KEY
        U-->>F: 200 OK + token usage
        F->>F: 4. Count tokens · refund unused reservation
        F-->>C: 200 OK
    else budget exceeded
        F-->>C: 429 X-Fairvisor-Reason: tpm_exceeded
    end

Both modes use the same policy bundle and produce the same rejection headers.

Enforcement capabilities

If you need to…	Algorithm	Typical identity keys	Reject reason
Cap request frequency	`token_bucket`	`jwt:user_id`, `header:x-api-key`, `ip:address`	`rate_limit_exceeded`
Cap cumulative spend	`cost_based`	`jwt:org_id`, `jwt:plan`	`budget_exhausted`
Cap LLM tokens (TPM/TPD)	`token_bucket_llm`	`jwt:org_id`, `jwt:user_id`	`tpm_exceeded`, `tpd_exceeded`
Instantly block a segment	kill switch	any descriptor	`kill_switch_active`
Dry-run before enforcing	shadow mode	any descriptor	allow + `would_reject` telemetry
Stop runaway agent loops	loop detection	request fingerprint	`loop_detected`
Clamp spend spikes	circuit breaker	global or policy scope	`circuit_breaker_open`

Identity keys can be JWT claims (jwt:org_id, jwt:plan), HTTP headers (header:x-api-key), or IP attributes (ip:address, ip:country). Combine multiple keys per rule for compound matching.

Performance

Latest measured latency @ 10,000 RPS

Percentile	Decision service	Reverse proxy	Raw nginx (baseline)
p50	304 μs	302 μs	235 μs
p90	543 μs	593 μs	409 μs
p99	2.00 ms	1.79 ms	1.95 ms
p99.9	4.00 ms	5.12 ms	3.62 ms

Enforcement overhead over raw nginx baseline: p50 +69 µs / p90 +134 µs.

Latest max sustained throughput (single instance)

Configuration	Max RPS
Simple rate limit (1 rule)	195,000
Complex policy (5 rules, JWT parsing, loop detection)	195,000

Reproduce: see fairvisor/benchmark — the canonical benchmark source of truth for Fairvisor Edge performance numbers.

Deployment

Target	Guide
Docker (local/VM)	docs/guides/docker
Kubernetes (Helm)	docs/guides/helm
LiteLLM integration	docs/guides/litellm
nginx `auth_request`	docs/gateway/nginx
Envoy `ext_authz`	docs/gateway/envoy
Kong / Traefik	docs/gateway

Fairvisor works alongside Kong, nginx, Envoy, and Traefik — or runs standalone as a reverse proxy when you don't need a separate gateway.

CLI

fairvisor init --template=api    # scaffold a policy bundle
fairvisor validate policy.json   # validate before deploying
fairvisor test --dry-run         # shadow-mode replay
fairvisor status                 # edge health and loaded bundle info
fairvisor logs                   # tail rejection events

SaaS control plane (optional)

The edge is open source and runs standalone. The SaaS adds:

Policy editor with validation and diff view
Fleet management and policy push
Analytics: top limited routes, tenants, abusive sources
Audit log exports for SOC 2 workflows
Alerts (Datadog, Sentry, PagerDuty, Prometheus)
RBAC and SSO (Enterprise)

If the SaaS is unreachable, the edge keeps enforcing with the last-known policy bundle. No degradation.

fairvisor.com/pricing

Project layout

src/fairvisor/           runtime modules (OpenResty/LuaJIT)
cli/                     command-line tooling
spec/                    unit and integration tests (busted)
tests/e2e/               Docker-based E2E tests (pytest)
examples/quickstart/     runnable quickstart (docker compose up -d)
examples/recipes/        deployable policy recipes (team budgets, agent guard, circuit breaker)
fixtures/                canonical request/response sample artifacts
helm/                    Helm chart
docker/                  Docker artifacts
docs/                    reference documentation

License

Mozilla Public License 2.0

Docs: docs.fairvisor.com · Website: fairvisor.com · Quickstart: 5 minutes to enforcement

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github		.github
bin		bin
cli		cli
data		data
debian		debian
docker		docker
docs		docs
examples		examples
fixtures		fixtures
helm/fairvisor-edge		helm/fairvisor-edge
spec		spec
src		src
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.luacheckrc		.luacheckrc
.luacov		.luacov
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
COPYRIGHT		COPYRIGHT
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FAIRVISOR

Policy and spend control at the edge for LLM

Table of Contents

Drop-in in 2 lines

Why we built this

Why not nginx / Kong / Envoy?

Installation

Docker

apt (Ubuntu / Debian)

Quick start

Fastest path

Minimal decision_service example

LLM token budget in 30 seconds

How a request flows

Architecture

Enforcement capabilities

Performance

Latest measured latency @ 10,000 RPS

Latest max sustained throughput (single instance)

Deployment

CLI

SaaS control plane (optional)

Project layout

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FAIRVISOR

Policy and spend control at the edge for LLM

Table of Contents

Drop-in in 2 lines

Why we built this

Why not nginx / Kong / Envoy?

Installation

Docker

apt (Ubuntu / Debian)

Quick start

Fastest path

Minimal decision_service example

LLM token budget in 30 seconds

How a request flows

Architecture

Enforcement capabilities

Performance

Latest measured latency @ 10,000 RPS

Latest max sustained throughput (single instance)

Deployment

CLI

SaaS control plane (optional)

Project layout

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages