All notable changes to the Claude Max OpenAI-HTTP-Proxy are documented in this file.
- Optional API key authentication — set
CLAUDE_PROXY_API_KEYenv var to require all requests to authenticate. OpenAI clients useAuthorization: Bearer <key>, Anthropic clients usex-api-key: <key>. Useshmac.compare_digest()for timing-safe comparison./healthand/endpoints are exempt. When unset, proxy behaves as before (no auth required).
.envfile support — the proxy now loads environment variables from a.envfile on startup viapython-dotenv. No moreexportneeded. A.env.exampletemplate is included. The.envfile is gitignored.- Interactive installer —
install.shnow prompts for port, host, API key (generate/enter/skip), CORS origins, log level, and max concurrent requests. Generates.envfrom answers. Preserves existing.envon reinstall. All prompts have defaults for quick install. - API key generation — installer option to auto-generate secure
sk-claude-...keys using Python'ssecrets.token_urlsafe(32). - Systemd
EnvironmentFile— generated service file loads.envdirectly, so env var changes take effect onsystemctl restartwithout regenerating the service.
- Added
python-dotenv>=1.0.0,<2.0.0
Major hardening release with 40+ improvements across security, reliability, performance, API conformance, and operational tooling. Server grew from 1062 to 1494 lines.
- Tool field sanitization —
_sanitize_tool_name()strips tool names to[a-zA-Z0-9_.-](max 128 chars);_sanitize_tool_text()collapses newlines in descriptions to prevent prompt injection via tool definitions - Error message sanitization —
_sanitize_cli_error()strips filesystem paths and environment variables from error responses before returning to clients - Request body size limit — configurable max body size (default 10MB) checked via Content-Length header in middleware; env var
CLAUDE_PROXY_MAX_BODY_BYTES - JSON request body validation — all 4 endpoints wrap
request.json()in try/except, returning proper 400 errors instead of 500 on malformed JSON - 413 error format detection — middleware returns Anthropic-format errors for
/anthropic/paths and OpenAI-format for all others - Type-safe tool parsing —
isinstance(tc, dict)check preventsAttributeErrorcrash if Claude outputs{"tool_call": "invalid"}
- Subprocess zombie prevention —
call_claude()andcall_claude_streaming()wrapped in try/finally withproc.kill()+await proc.wait()cleanup - stderr deadlock fix — streaming subprocess uses
stderr=asyncio.subprocess.DEVNULLto prevent pipe buffer deadlock - Silent model fallback removed — unknown models now return 404 instead of silently mapping to
claude-sonnet-4-6 - Multi-line tool JSON parsing — replaced line-by-line
startswithmatching with brace-counting_extract_tool_json()that handles multi-line JSON objects - Duplicate Anthropic tool parsing eliminated — tool uses parsed once and reused instead of calling
anthropic_parse_tool_use()twice - Streaming tool call detection —
_StreamToolBufferwithholds lines starting with{"tool_call"/{"tool_use"from streamed content, preventing raw JSON from appearing as text to clients - Anthropic deferred text fallback — unparseable tool-like text is emitted as content instead of being silently dropped
- Incomplete deferred block recovery —
_StreamToolBuffer.flush()moves incomplete multi-line tool blocks (unbalanced braces) back to content output - Streaming 429 pre-check — semaphore availability checked before returning
StreamingResponse, so clients receive proper HTTP 429 instead of HTTP 200 with embedded error text - Health check subprocess tracked —
claude --versionprocess added to_active_processeswith try/finally cleanup to prevent orphans on shutdown - OpenAI streaming tool_calls in delta — tool calls now included in the final streaming chunk's delta (not just finish_reason)
- Tool call index field —
openai_parse_tool_calls()includes required"index"field per OpenAI streaming spec - 429 error code corrected — changed from non-standard
"server_busy"to OpenAI-spec"rate_limit_exceeded"
- Backpressure for streaming —
_buffered_claude_stream()places anasyncio.Queue(maxsize=64)between subprocess reader and SSE generator; slow clients cause natural backpressure instead of unbounded memory growth - String accumulation — replaced O(n^2)
accumulated += textwith list +"".join()pattern in all streaming generators - Subprocess timeouts — configurable
CLI_TIMEOUT_SECONDS(default 300s) for non-streaming,CLI_LINE_TIMEOUT_SECONDS(120s per-line) for streaming - Concurrency limiter —
asyncio.Semaphorewith configurable--max-concurrent(default 10) and 30-second acquire timeout returning 429 - Pre-built model responses —
/modelsendpoint responses built once at startup (_OPENAI_MODELS_RESPONSE,_ANTHROPIC_MODELS_RESPONSE) _aggregate_input_tokens()helper — centralized cache token calculation eliminates 3x code duplication_sse()helper — centralized SSE event builder replaces 9 inline format strings in Anthropic streaming
- Anthropic
created_atfield — added to both non-streaming message response and streamingmessage_startevent (ISO 8601 UTC) - Anthropic
max_tokensvalidation — rejects non-positive integers with 400 error - Anthropic streaming event sequence — correct order:
message_start->content_block_start->ping-> deltas ->content_block_stop-> tool blocks ->message_delta->message_stop - OpenAI
X-Accel-Bufferingheader — added to legacy/completionsstreaming (was missing, present on other endpoints) - OpenAI message edge cases —
.get()for block text (prevents KeyError), legacyfunctionrole support, type validation for content and tool_calls - UTF-8 robustness — all
decode()calls useerrors='replace'to handle corrupted CLI output without crashing
- Multi-worker support —
--workers NCLI arg; usesuvicorn.run("server:app", workers=N)with env var config propagation via lifespan handler - Graceful shutdown — lifespan handler with configurable drain period (
CLAUDE_PROXY_DRAIN_SECONDS, default 5s), then SIGTERM -> 5s wait -> SIGKILL - Graceful process termination — streaming subprocess cleanup uses SIGTERM -> 3s wait -> SIGKILL instead of immediate SIGKILL
- Client disconnect detection — streaming generators check
request.is_disconnected()every 10 chunks; subprocess killed on disconnect - Request ID tracking —
X-Request-IDheader generated per request (or forwarded from client), included in all log messages and response headers - Health check with CLI verification —
/healthendpoint runsclaude --version(cached 30s), reports active/max concurrent requests, CLI version, worker PID, degraded status - Configurable CORS —
CLAUDE_PROXY_CORS_ORIGINSenv var (comma-separated, default*) - Configurable log level —
CLAUDE_PROXY_LOG_LEVELenv var +--log-levelCLI flag - Configurable access log —
--no-access-logflag +CLAUDE_PROXY_ACCESS_LOGenv var - Request logging middleware — logs method, path, status, latency; streaming-aware (logs "streaming" instead of blocking for elapsed time)
- Port validation — rejects ports outside 1-65535
- Non-JSON fallback logging — warns when CLI returns non-JSON output
- Dynamic model timestamps —
_STARTUP_TIMEreplaces hardcoded epoch in modelcreatedfields - WorkingDirectory in service files — both
install.shtemplate and staticclaude-proxy.servicesetWorkingDirectoryfor multi-worker import resolution
- Version pinning —
fastapi>=0.100.0,<1.0.0anduvicorn>=0.20.0,<1.0.0(was unbounded>=)
| Argument | Default | Description |
|---|---|---|
--timeout |
300 | CLI timeout in seconds |
--max-concurrent |
10 | Max concurrent CLI requests per worker |
--log-level |
INFO | Log level (DEBUG/INFO/WARNING/ERROR) |
--no-access-log |
false | Disable uvicorn access log |
--workers |
1 | Number of uvicorn workers |
| Variable | Default | Description |
|---|---|---|
CLAUDE_PROXY_LOG_LEVEL |
INFO | Log level |
CLAUDE_PROXY_CORS_ORIGINS |
* | Comma-separated CORS origins |
CLAUDE_PROXY_MAX_BODY_BYTES |
10485760 | Max request body size |
CLAUDE_PROXY_ACCESS_LOG |
true | Enable/disable access log |
CLAUDE_PROXY_DRAIN_SECONDS |
5 | Graceful shutdown drain period |
- Dual API proxy: OpenAI + Anthropic API surfaces
- Anthropic Messages API (
/anthropic/v1/messages) with streaming support - Anthropic models and token counting endpoints
- OpenAI function calling via prompt injection
- Anthropic tool use via prompt injection
- Model alias mapping (GPT -> Claude)
- Added CLAUDE.md, install.sh, comprehensive README
- systemd user service support
- Automated installation script
- Initial release: OpenAI-compatible API proxy for Claude Max subscription
- Chat completions, legacy completions, model listing
- Streaming support via SSE