Skip to content

fairvisor/edge

Fairvisor

FAIRVISOR

Policy and spend control at the edge for LLM

License: MPL-2.0 Latest release CI Lua coverage GHCR image Platforms: linux/amd64 · linux/arm64 Docs


Table of Contents


Drop-in in 2 lines

# Before
client = OpenAI(base_url="https://api.openai.com/v1", api_key="sk-...")

# After — no other code changes
client = OpenAI(base_url="https://your-fairvisor/openai/v1", api_key="sk-...")

Token limits, cost budgets, loop detection, and circuit breakers apply transparently. Works with OpenAI-compatible endpoints: OpenAI, Azure OpenAI, Anthropic-compatible gateways, Gemini-compatible gateways, vLLM, Ollama, LiteLLM.


When multiple tenants, agents, or services share an API, one misbehaving caller can exhaust the budget for everyone — whether that's LLM tokens, API credits, or request quotas. Fairvisor is a lightweight enforcement engine that gives each tenant isolated limits at the edge: token budgets, cost caps, rate limits, and kill switches — keyed on JWT claims, API keys, or IP. One container, one JSON policy file, no Redis.

Why we built this

API gateways count requests. LLM providers bill by the token.

When you serve multiple tenants — customers, teams, or agentic pipelines — that gap becomes a real problem. One runaway agent can consume a month's token budget overnight. Your gateway sees one request per second; your invoice shows 3 million tokens.

We needed something that:

  • Understood token budgets, not just request counts
  • Could key limits on JWT claims (org_id, plan, user_id), not just IPs
  • Kept every request fast — no Redis round-trip, no extra network call in the hot path
  • Could plug into nginx or Envoy or run standalone as a transparent LLM proxy
  • Plugged in without rewriting application code: change the base URL in the client — enforcement applies transparently to existing LLM calls

We couldn't find it, so we built Fairvisor.

Why not nginx / Kong / Envoy?

If you have an existing gateway, the question is whether Fairvisor adds anything you can't get from the plugin ecosystem already installed. Here is the honest comparison:

Concern nginx limit_req Kong rate-limiting Envoy global rate limit Fairvisor Edge
Per-tenant limits (JWT claim) No — IP/zone only Partial — custom plugin Yes, via descriptors Yes — jwt:org_id, jwt:plan, any claim
LLM token budgets (TPM/TPD) No No No Yes — pre-request reservation + post-response refund
Cost budgets (cumulative $) No No No Yes
Distributed state requirement No (per-process) Redis or Postgres Separate rate limit service No — in-process ngx.shared.dict
Network round-trip in hot path No Yes (to Redis) Yes (to rate limit service) No
Policy as versioned JSON No No (Admin API state) Partial (Envoy config) Yes — commit, diff, roll back
Kill switches (instant, no restart) No No No Yes
Loop detection for agents No No No Yes

If nginx limit_req is enough for you, use it. It has zero overhead and is the right tool for simple per-IP global throttling. Fairvisor becomes relevant when you need per-tenant awareness, JWT-claim-based bucketing, or cost/token tracking that limit_req has no model for.

If you are already running Kong, the built-in rate limiting plugin stores counters in Redis or Postgres — every decision is a network call. Fairvisor can run alongside Kong as an auth_request decision service with no external state. See Kong / Traefik integration →

If you are running Envoy, the global rate limit service requires deploying a separate Redis-backed service with its own config language. Fairvisor is one container, one JSON file, and integrates via ext_authz in the same position. See Envoy ext_authz integration →

If you are on Cloudflare or Akamai, per-JWT-claim limits, LLM token budgets, and cost caps are not in the platform's model. If your limits are tenant-aware or cost-aware, you need something that runs in your own stack.

Fairvisor can run alongside Kong, nginx, and Envoy — or as a standalone reverse proxy if you don't need a separate gateway. See nginx auth_request → · Envoy ext_authz → · Kong / Traefik → for integration patterns.

Installation

Docker

docker run -d \
  -p 8080:8080 \
  -e FAIRVISOR_CONFIG_FILE=/etc/fairvisor/policy.json \
  -v "$PWD/policy.json:/etc/fairvisor/policy.json:ro" \
  ghcr.io/fairvisor/fairvisor-edge:latest

apt (Ubuntu / Debian)

  1. Add the Fairvisor repository:
curl -sS https://fairvisor.github.io/apt/pubkey.gpg | sudo gpg --dearmor -o /usr/share/keyrings/fairvisor-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/fairvisor-archive-keyring.gpg] https://fairvisor.github.io/apt \$(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/fairvisor.list
  1. Install:
sudo apt update
sudo apt install fairvisor
  1. Verify:
fairvisor version
systemctl status fairvisor

Quick start

Which mode is right for you?

  • Reverse proxy — you have an LLM or upstream service → Fairvisor sits in front, enforces budgets, and proxies the request. Set FAIRVISOR_LLM_FORMAT to enable token counting and streaming cutoff. Fastest to try.
  • Decision service — you already run nginx, Envoy, or Kong → call POST /v1/decision from auth_request / ext_authz.

Fastest path

git clone https://github.com/fairvisor/edge.git
cd edge/examples/quickstart
docker compose up -d

Run your first enforce/reject test in under a minute — full walkthrough in examples/quickstart/README.md.

Recipes: examples/recipes/ — team budgets, runaway agent guard, circuit-breaker.

Sample artifacts: fixtures/ — canonical enforce/reject fixtures (OpenAI, Anthropic, Gemini).

Minimal decision_service example

Expand — manual setup with a single docker run

1. Create a policy

mkdir fairvisor-demo && cd fairvisor-demo

policy.json:

{
  "bundle_version": 1,
  "issued_at": "2026-01-01T00:00:00Z",
  "policies": [
    {
      "id": "demo-rate-limit",
      "spec": {
        "selector": { "pathPrefix": "/", "methods": ["GET", "POST"] },
        "mode": "enforce",
        "rules": [
          {
            "name": "global-rps",
            "limit_keys": ["ip:address"],
            "algorithm": "token_bucket",
            "algorithm_config": { "tokens_per_second": 5, "burst": 10 }
          }
        ]
      }
    }
  ],
  "kill_switches": []
}

2. Run the edge

docker run -d \
  --name fairvisor \
  -p 8080:8080 \
  -v "$(pwd)/policy.json:/etc/fairvisor/policy.json:ro" \
  -e FAIRVISOR_CONFIG_FILE=/etc/fairvisor/policy.json \
  -e FAIRVISOR_MODE=decision_service \
  ghcr.io/fairvisor/fairvisor-edge:latest

3. Verify

curl -sf http://localhost:8080/readyz
# {"status":"ok"}

# Allowed request → HTTP 200
curl -s -o /dev/null -w "HTTP %{http_code}\n" \
  -H "X-Original-Method: GET" \
  -H "X-Original-URI: /api/data" \
  -H "X-Forwarded-For: 10.0.0.1" \
  http://localhost:8080/v1/decision

# Rejected request — exhaust the burst (>10 requests)
for i in $(seq 1 12); do
  curl -s -o /dev/null -w "HTTP %{http_code}\n" \
    -H "X-Original-Method: GET" \
    -H "X-Original-URI: /api/data" \
    -H "X-Forwarded-For: 10.0.0.1" \
    http://localhost:8080/v1/decision
done
# last requests → HTTP 429  X-Fairvisor-Reason: rate_limit_exceeded

Full walkthrough: docs.fairvisor.com/docs/quickstart

LLM token budget in 30 seconds

Fairvisor proxies traffic to your LLM provider and enforces token budgets. Set FAIRVISOR_MODE=reverse_proxy, point FAIRVISOR_BACKEND_URL at your LLM, and set FAIRVISOR_LLM_FORMAT so Fairvisor knows how to parse streaming responses.

0. Point your client at Fairvisor (the only app-level change):

client = OpenAI(
    base_url="https://your-fairvisor-host/openai/v1",
    api_key="sk-proj-..."
)

1. Run the edge — pointing at OpenAI:

docker run -d \
  -e FAIRVISOR_MODE=reverse_proxy \
  -e FAIRVISOR_BACKEND_URL=https://api.openai.com \
  -e FAIRVISOR_LLM_FORMAT=openai \
  -e FAIRVISOR_STRIP_REQUEST_HEADERS="Authorization" \
  -e FAIRVISOR_UPSTREAM_HEADER_Authorization="Bearer sk-proj-..." \
  -e FAIRVISOR_CONFIG_FILE=/etc/fairvisor/policy.json \
  -v "$PWD/policy.json:/etc/fairvisor/policy.json:ro" \
  -p 8080:8080 \
  ghcr.io/fairvisor/fairvisor-edge:latest

Header handling — two env var families control what reaches the upstream:

  • FAIRVISOR_STRIP_REQUEST_HEADERS (comma-separated) — headers to remove from the client request before proxying. Stripping happens after rate-limit keying, so Fairvisor can still read and key on the header value (e.g. jwt:org_id from the client's Authorization JWT) before it is removed.
  • FAIRVISOR_UPSTREAM_HEADER_<Name> — headers to inject when forwarding. The suffix becomes the header name with _- (e.g. FAIRVISOR_UPSTREAM_HEADER_AuthorizationAuthorization, FAIRVISOR_UPSTREAM_HEADER_X_Api_KeyX-Api-Key). Injection happens after stripping.

The order of operations on every request:

  1. Read client headers → enforce rate limits / token budgets
  2. Strip headers listed in FAIRVISOR_STRIP_REQUEST_HEADERS
  3. Inject headers from FAIRVISOR_UPSTREAM_HEADER_*
  4. Forward to upstream

This means the client's JWT is used for keying but never reaches the upstream, and the upstream key is never visible to the client.

2. Policy — one rule, per-org TPM + daily cap:

{
  "id": "llm-budget",
  "spec": {
    "selector": { "pathPrefix": "/" },
    "mode": "enforce",
    "rules": [
      {
        "name": "per-org-tpm",
        "limit_keys": ["jwt:org_id"],
        "algorithm": "token_bucket_llm",
        "algorithm_config": {
          "tokens_per_minute": 60000,
          "tokens_per_day": 1200000,
          "default_max_completion": 800
        }
      }
    ]
  }
}

3. Call the API — client sends only their JWT:

curl https://your-fairvisor-host/v1/chat/completions \
  -H "Authorization: Bearer eyJhbGc..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'

Fairvisor parses the JWT claims (no signature validation — the JWT is trusted as-is), extracts org_id, charges tokens against the budget, injects the upstream headers configured via FAIRVISOR_UPSTREAM_HEADER_*, and forwards the request. The upstream key is never visible to the client.

When the budget is exhausted:

HTTP/1.1 429 Too Many Requests
X-Fairvisor-Reason: tpm_exceeded
Retry-After: 12
RateLimit-Limit: 60000
RateLimit-Remaining: 0

Each organization gets its own independent 60k TPM / 1.2M TPD budget. Set FAIRVISOR_LLM_FORMAT=anthropic or FAIRVISOR_LLM_FORMAT=gemini for those providers.

Decision service mode: if you already have a gateway, use selector: { "pathPrefix": "/v1/chat" } and call POST /v1/decision from your existing auth_request or ext_authz hook instead.

How a request flows

Decision service mode — Fairvisor runs as a sidecar. Your existing gateway calls /v1/decision via auth_request (nginx) or ext_authz (Envoy) and handles forwarding itself.

Reverse proxy mode — Fairvisor sits inline. Traffic arrives at Fairvisor directly, gets evaluated, and is proxied to the upstream if allowed. Set FAIRVISOR_LLM_FORMAT to enable LLM-aware token counting and streaming cutoff for OpenAI, Anthropic, or Gemini upstreams.

Both modes use the same policy bundle and return the same rejection headers.

When a request is rejected:

HTTP/1.1 429 Too Many Requests
X-Fairvisor-Reason: tpm_exceeded
Retry-After: 12
RateLimit: "llm-default";r=0;t=12
RateLimit-Limit: 120000
RateLimit-Remaining: 0
RateLimit-Reset: 12

Headers follow RFC 9333 RateLimit Fields. X-Fairvisor-Reason gives clients a machine-readable code for retry logic and observability.

Architecture

Decision service mode — sidecar: your gateway calls /v1/decision, handles forwarding itself.

sequenceDiagram
    participant C as Client
    participant G as Your Gateway<br/>(nginx / Envoy / Kong)
    participant F as Fairvisor Edge<br/>decision_service
    participant U as Upstream service

    C->>G: Request
    G->>F: POST /v1/decision<br/>(auth_request / ext_authz)
    alt allow
        F-->>G: 204 No Content
        G->>U: Forward request
        U-->>G: Response
        G-->>C: Response
    else reject
        F-->>G: 429 + RateLimit headers
        G-->>C: 429 Too Many Requests
    end
Loading

Reverse proxy mode — inline: Fairvisor handles both enforcement and proxying.

sequenceDiagram
    participant C as Client
    participant F as Fairvisor Edge<br/>reverse_proxy
    participant U as Upstream service

    C->>F: Request
    alt allow
        F->>U: Forward request
        U-->>F: Response
        F-->>C: Response
    else reject
        F-->>C: 429 + RFC 9333 headers
    end
Loading

Reverse proxy mode with LLM format — inline LLM proxy with token budget enforcement.

sequenceDiagram
    participant C as Client
    participant F as Fairvisor Edge<br/>reverse_proxy + FAIRVISOR_LLM_FORMAT
    participant U as Upstream LLM<br/>(OpenAI / Anthropic / Gemini)

    C->>F: POST /v1/chat/completions<br/>Authorization: Bearer CLIENT_JWT
    F->>F: 1. Parse JWT claims (org_id, user_id)
    F->>F: 2. Enforce TPM / TPD / cost budget
    alt budget ok
        F->>F: 3. Strip FAIRVISOR_STRIP_REQUEST_HEADERS · inject FAIRVISOR_UPSTREAM_HEADER_*
        F->>U: POST /v1/chat/completions<br/>Authorization: Bearer UPSTREAM_KEY
        U-->>F: 200 OK + token usage
        F->>F: 4. Count tokens · refund unused reservation
        F-->>C: 200 OK
    else budget exceeded
        F-->>C: 429 X-Fairvisor-Reason: tpm_exceeded
    end
Loading

Both modes use the same policy bundle and produce the same rejection headers.

Enforcement capabilities

If you need to… Algorithm Typical identity keys Reject reason
Cap request frequency token_bucket jwt:user_id, header:x-api-key, ip:address rate_limit_exceeded
Cap cumulative spend cost_based jwt:org_id, jwt:plan budget_exhausted
Cap LLM tokens (TPM/TPD) token_bucket_llm jwt:org_id, jwt:user_id tpm_exceeded, tpd_exceeded
Instantly block a segment kill switch any descriptor kill_switch_active
Dry-run before enforcing shadow mode any descriptor allow + would_reject telemetry
Stop runaway agent loops loop detection request fingerprint loop_detected
Clamp spend spikes circuit breaker global or policy scope circuit_breaker_open

Identity keys can be JWT claims (jwt:org_id, jwt:plan), HTTP headers (header:x-api-key), or IP attributes (ip:address, ip:country). Combine multiple keys per rule for compound matching.

Performance

Latest measured latency @ 10,000 RPS

Percentile Decision service Reverse proxy Raw nginx (baseline)
p50 304 μs 302 μs 235 μs
p90 543 μs 593 μs 409 μs
p99 2.00 ms 1.79 ms 1.95 ms
p99.9 4.00 ms 5.12 ms 3.62 ms

Enforcement overhead over raw nginx baseline: p50 +69 µs / p90 +134 µs.

Latest max sustained throughput (single instance)

Configuration Max RPS
Simple rate limit (1 rule) 195,000
Complex policy (5 rules, JWT parsing, loop detection) 195,000

Reproduce: see fairvisor/benchmark — the canonical benchmark source of truth for Fairvisor Edge performance numbers.

Deployment

Target Guide
Docker (local/VM) docs/guides/docker
Kubernetes (Helm) docs/guides/helm
LiteLLM integration docs/guides/litellm
nginx auth_request docs/gateway/nginx
Envoy ext_authz docs/gateway/envoy
Kong / Traefik docs/gateway

Fairvisor works alongside Kong, nginx, Envoy, and Traefik — or runs standalone as a reverse proxy when you don't need a separate gateway.

CLI

fairvisor init --template=api    # scaffold a policy bundle
fairvisor validate policy.json   # validate before deploying
fairvisor test --dry-run         # shadow-mode replay
fairvisor status                 # edge health and loaded bundle info
fairvisor logs                   # tail rejection events

SaaS control plane (optional)

The edge is open source and runs standalone. The SaaS adds:

  • Policy editor with validation and diff view
  • Fleet management and policy push
  • Analytics: top limited routes, tenants, abusive sources
  • Audit log exports for SOC 2 workflows
  • Alerts (Datadog, Sentry, PagerDuty, Prometheus)
  • RBAC and SSO (Enterprise)

If the SaaS is unreachable, the edge keeps enforcing with the last-known policy bundle. No degradation.

fairvisor.com/pricing

Project layout

src/fairvisor/           runtime modules (OpenResty/LuaJIT)
cli/                     command-line tooling
spec/                    unit and integration tests (busted)
tests/e2e/               Docker-based E2E tests (pytest)
examples/quickstart/     runnable quickstart (docker compose up -d)
examples/recipes/        deployable policy recipes (team budgets, agent guard, circuit breaker)
fixtures/                canonical request/response sample artifacts
helm/                    Helm chart
docker/                  Docker artifacts
docs/                    reference documentation

License

Mozilla Public License 2.0


Docs: docs.fairvisor.com · Website: fairvisor.com · Quickstart: 5 minutes to enforcement

Packages

 
 
 

Contributors