Changelog

All notable changes to the Claude Max OpenAI-HTTP-Proxy are documented in this file.

[3.2.0] - 2026-04-08

Security

Optional API key authentication — set CLAUDE_PROXY_API_KEY env var to require all requests to authenticate. OpenAI clients use Authorization: Bearer <key>, Anthropic clients use x-api-key: <key>. Uses hmac.compare_digest() for timing-safe comparison. /health and / endpoints are exempt. When unset, proxy behaves as before (no auth required).

Added

.env file support — the proxy now loads environment variables from a .env file on startup via python-dotenv. No more export needed. A .env.example template is included. The .env file is gitignored.
Interactive installer — install.sh now prompts for port, host, API key (generate/enter/skip), CORS origins, log level, and max concurrent requests. Generates .env from answers. Preserves existing .env on reinstall. All prompts have defaults for quick install.
API key generation — installer option to auto-generate secure sk-claude-... keys using Python's secrets.token_urlsafe(32).
Systemd EnvironmentFile — generated service file loads .env directly, so env var changes take effect on systemctl restart without regenerating the service.

Dependencies

Added python-dotenv>=1.0.0,<2.0.0

[3.1.0] - 2026-04-08

Major hardening release with 40+ improvements across security, reliability, performance, API conformance, and operational tooling. Server grew from 1062 to 1494 lines.

Security

Tool field sanitization — _sanitize_tool_name() strips tool names to [a-zA-Z0-9_.-] (max 128 chars); _sanitize_tool_text() collapses newlines in descriptions to prevent prompt injection via tool definitions
Error message sanitization — _sanitize_cli_error() strips filesystem paths and environment variables from error responses before returning to clients
Request body size limit — configurable max body size (default 10MB) checked via Content-Length header in middleware; env var CLAUDE_PROXY_MAX_BODY_BYTES
JSON request body validation — all 4 endpoints wrap request.json() in try/except, returning proper 400 errors instead of 500 on malformed JSON
413 error format detection — middleware returns Anthropic-format errors for /anthropic/ paths and OpenAI-format for all others
Type-safe tool parsing — isinstance(tc, dict) check prevents AttributeError crash if Claude outputs {"tool_call": "invalid"}

Bug Fixes

Subprocess zombie prevention — call_claude() and call_claude_streaming() wrapped in try/finally with proc.kill() + await proc.wait() cleanup
stderr deadlock fix — streaming subprocess uses stderr=asyncio.subprocess.DEVNULL to prevent pipe buffer deadlock
Silent model fallback removed — unknown models now return 404 instead of silently mapping to claude-sonnet-4-6
Multi-line tool JSON parsing — replaced line-by-line startswith matching with brace-counting _extract_tool_json() that handles multi-line JSON objects
Duplicate Anthropic tool parsing eliminated — tool uses parsed once and reused instead of calling anthropic_parse_tool_use() twice
Streaming tool call detection — _StreamToolBuffer withholds lines starting with {"tool_call" / {"tool_use" from streamed content, preventing raw JSON from appearing as text to clients
Anthropic deferred text fallback — unparseable tool-like text is emitted as content instead of being silently dropped
Incomplete deferred block recovery — _StreamToolBuffer.flush() moves incomplete multi-line tool blocks (unbalanced braces) back to content output
Streaming 429 pre-check — semaphore availability checked before returning StreamingResponse, so clients receive proper HTTP 429 instead of HTTP 200 with embedded error text
Health check subprocess tracked — claude --version process added to _active_processes with try/finally cleanup to prevent orphans on shutdown
OpenAI streaming tool_calls in delta — tool calls now included in the final streaming chunk's delta (not just finish_reason)
Tool call index field — openai_parse_tool_calls() includes required "index" field per OpenAI streaming spec
429 error code corrected — changed from non-standard "server_busy" to OpenAI-spec "rate_limit_exceeded"

Performance

Backpressure for streaming — _buffered_claude_stream() places an asyncio.Queue(maxsize=64) between subprocess reader and SSE generator; slow clients cause natural backpressure instead of unbounded memory growth
String accumulation — replaced O(n^2) accumulated += text with list + "".join() pattern in all streaming generators
Subprocess timeouts — configurable CLI_TIMEOUT_SECONDS (default 300s) for non-streaming, CLI_LINE_TIMEOUT_SECONDS (120s per-line) for streaming
Concurrency limiter — asyncio.Semaphore with configurable --max-concurrent (default 10) and 30-second acquire timeout returning 429
Pre-built model responses — /models endpoint responses built once at startup (_OPENAI_MODELS_RESPONSE, _ANTHROPIC_MODELS_RESPONSE)
_aggregate_input_tokens() helper — centralized cache token calculation eliminates 3x code duplication
_sse() helper — centralized SSE event builder replaces 9 inline format strings in Anthropic streaming

API Conformance

Anthropic created_at field — added to both non-streaming message response and streaming message_start event (ISO 8601 UTC)
Anthropic max_tokens validation — rejects non-positive integers with 400 error
Anthropic streaming event sequence — correct order: message_start -> content_block_start -> ping -> deltas -> content_block_stop -> tool blocks -> message_delta -> message_stop
OpenAI X-Accel-Buffering header — added to legacy /completions streaming (was missing, present on other endpoints)
OpenAI message edge cases — .get() for block text (prevents KeyError), legacy function role support, type validation for content and tool_calls
UTF-8 robustness — all decode() calls use errors='replace' to handle corrupted CLI output without crashing

Operational Improvements

Multi-worker support — --workers N CLI arg; uses uvicorn.run("server:app", workers=N) with env var config propagation via lifespan handler
Graceful shutdown — lifespan handler with configurable drain period (CLAUDE_PROXY_DRAIN_SECONDS, default 5s), then SIGTERM -> 5s wait -> SIGKILL
Graceful process termination — streaming subprocess cleanup uses SIGTERM -> 3s wait -> SIGKILL instead of immediate SIGKILL
Client disconnect detection — streaming generators check request.is_disconnected() every 10 chunks; subprocess killed on disconnect
Request ID tracking — X-Request-ID header generated per request (or forwarded from client), included in all log messages and response headers
Health check with CLI verification — /health endpoint runs claude --version (cached 30s), reports active/max concurrent requests, CLI version, worker PID, degraded status
Configurable CORS — CLAUDE_PROXY_CORS_ORIGINS env var (comma-separated, default *)
Configurable log level — CLAUDE_PROXY_LOG_LEVEL env var + --log-level CLI flag
Configurable access log — --no-access-log flag + CLAUDE_PROXY_ACCESS_LOG env var
Request logging middleware — logs method, path, status, latency; streaming-aware (logs "streaming" instead of blocking for elapsed time)
Port validation — rejects ports outside 1-65535
Non-JSON fallback logging — warns when CLI returns non-JSON output
Dynamic model timestamps — _STARTUP_TIME replaces hardcoded epoch in model created fields
WorkingDirectory in service files — both install.sh template and static claude-proxy.service set WorkingDirectory for multi-worker import resolution

Dependencies

Version pinning — fastapi>=0.100.0,<1.0.0 and uvicorn>=0.20.0,<1.0.0 (was unbounded >=)

CLI Arguments (New)

Argument	Default	Description
`--timeout`	300	CLI timeout in seconds
`--max-concurrent`	10	Max concurrent CLI requests per worker
`--log-level`	INFO	Log level (DEBUG/INFO/WARNING/ERROR)
`--no-access-log`	false	Disable uvicorn access log
`--workers`	1	Number of uvicorn workers

Environment Variables (New)

Variable	Default	Description
`CLAUDE_PROXY_LOG_LEVEL`	INFO	Log level
`CLAUDE_PROXY_CORS_ORIGINS`	*	Comma-separated CORS origins
`CLAUDE_PROXY_MAX_BODY_BYTES`	10485760	Max request body size
`CLAUDE_PROXY_ACCESS_LOG`	true	Enable/disable access log
`CLAUDE_PROXY_DRAIN_SECONDS`	5	Graceful shutdown drain period

[3.0.0] - 2026-04-07

Dual API proxy: OpenAI + Anthropic API surfaces
Anthropic Messages API (/anthropic/v1/messages) with streaming support
Anthropic models and token counting endpoints
OpenAI function calling via prompt injection
Anthropic tool use via prompt injection
Model alias mapping (GPT -> Claude)

[2.0.0] - 2026-04-06

Added CLAUDE.md, install.sh, comprehensive README
systemd user service support
Automated installation script

[1.0.0] - 2026-04-06

Initial release: OpenAI-compatible API proxy for Claude Max subscription
Chat completions, legacy completions, model listing
Streaming support via SSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

[3.2.0] - 2026-04-08

Security

Added

Dependencies

[3.1.0] - 2026-04-08

Security

Bug Fixes

Performance

API Conformance

Operational Improvements

Dependencies

CLI Arguments (New)

Environment Variables (New)

[3.0.0] - 2026-04-07

[2.0.0] - 2026-04-06

[1.0.0] - 2026-04-06

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[3.2.0] - 2026-04-08

Security

Added

Dependencies

[3.1.0] - 2026-04-08

Security

Bug Fixes

Performance

API Conformance

Operational Improvements

Dependencies

CLI Arguments (New)

Environment Variables (New)

[3.0.0] - 2026-04-07

[2.0.0] - 2026-04-06

[1.0.0] - 2026-04-06