Skip to content

Latest commit

 

History

History
130 lines (98 loc) · 9.46 KB

File metadata and controls

130 lines (98 loc) · 9.46 KB

Changelog

All notable changes to the Claude Max OpenAI-HTTP-Proxy are documented in this file.

[3.2.0] - 2026-04-08

Security

  • Optional API key authentication — set CLAUDE_PROXY_API_KEY env var to require all requests to authenticate. OpenAI clients use Authorization: Bearer <key>, Anthropic clients use x-api-key: <key>. Uses hmac.compare_digest() for timing-safe comparison. /health and / endpoints are exempt. When unset, proxy behaves as before (no auth required).

Added

  • .env file support — the proxy now loads environment variables from a .env file on startup via python-dotenv. No more export needed. A .env.example template is included. The .env file is gitignored.
  • Interactive installerinstall.sh now prompts for port, host, API key (generate/enter/skip), CORS origins, log level, and max concurrent requests. Generates .env from answers. Preserves existing .env on reinstall. All prompts have defaults for quick install.
  • API key generation — installer option to auto-generate secure sk-claude-... keys using Python's secrets.token_urlsafe(32).
  • Systemd EnvironmentFile — generated service file loads .env directly, so env var changes take effect on systemctl restart without regenerating the service.

Dependencies

  • Added python-dotenv>=1.0.0,<2.0.0

[3.1.0] - 2026-04-08

Major hardening release with 40+ improvements across security, reliability, performance, API conformance, and operational tooling. Server grew from 1062 to 1494 lines.

Security

  • Tool field sanitization_sanitize_tool_name() strips tool names to [a-zA-Z0-9_.-] (max 128 chars); _sanitize_tool_text() collapses newlines in descriptions to prevent prompt injection via tool definitions
  • Error message sanitization_sanitize_cli_error() strips filesystem paths and environment variables from error responses before returning to clients
  • Request body size limit — configurable max body size (default 10MB) checked via Content-Length header in middleware; env var CLAUDE_PROXY_MAX_BODY_BYTES
  • JSON request body validation — all 4 endpoints wrap request.json() in try/except, returning proper 400 errors instead of 500 on malformed JSON
  • 413 error format detection — middleware returns Anthropic-format errors for /anthropic/ paths and OpenAI-format for all others
  • Type-safe tool parsingisinstance(tc, dict) check prevents AttributeError crash if Claude outputs {"tool_call": "invalid"}

Bug Fixes

  • Subprocess zombie preventioncall_claude() and call_claude_streaming() wrapped in try/finally with proc.kill() + await proc.wait() cleanup
  • stderr deadlock fix — streaming subprocess uses stderr=asyncio.subprocess.DEVNULL to prevent pipe buffer deadlock
  • Silent model fallback removed — unknown models now return 404 instead of silently mapping to claude-sonnet-4-6
  • Multi-line tool JSON parsing — replaced line-by-line startswith matching with brace-counting _extract_tool_json() that handles multi-line JSON objects
  • Duplicate Anthropic tool parsing eliminated — tool uses parsed once and reused instead of calling anthropic_parse_tool_use() twice
  • Streaming tool call detection_StreamToolBuffer withholds lines starting with {"tool_call" / {"tool_use" from streamed content, preventing raw JSON from appearing as text to clients
  • Anthropic deferred text fallback — unparseable tool-like text is emitted as content instead of being silently dropped
  • Incomplete deferred block recovery_StreamToolBuffer.flush() moves incomplete multi-line tool blocks (unbalanced braces) back to content output
  • Streaming 429 pre-check — semaphore availability checked before returning StreamingResponse, so clients receive proper HTTP 429 instead of HTTP 200 with embedded error text
  • Health check subprocess trackedclaude --version process added to _active_processes with try/finally cleanup to prevent orphans on shutdown
  • OpenAI streaming tool_calls in delta — tool calls now included in the final streaming chunk's delta (not just finish_reason)
  • Tool call index fieldopenai_parse_tool_calls() includes required "index" field per OpenAI streaming spec
  • 429 error code corrected — changed from non-standard "server_busy" to OpenAI-spec "rate_limit_exceeded"

Performance

  • Backpressure for streaming_buffered_claude_stream() places an asyncio.Queue(maxsize=64) between subprocess reader and SSE generator; slow clients cause natural backpressure instead of unbounded memory growth
  • String accumulation — replaced O(n^2) accumulated += text with list + "".join() pattern in all streaming generators
  • Subprocess timeouts — configurable CLI_TIMEOUT_SECONDS (default 300s) for non-streaming, CLI_LINE_TIMEOUT_SECONDS (120s per-line) for streaming
  • Concurrency limiterasyncio.Semaphore with configurable --max-concurrent (default 10) and 30-second acquire timeout returning 429
  • Pre-built model responses/models endpoint responses built once at startup (_OPENAI_MODELS_RESPONSE, _ANTHROPIC_MODELS_RESPONSE)
  • _aggregate_input_tokens() helper — centralized cache token calculation eliminates 3x code duplication
  • _sse() helper — centralized SSE event builder replaces 9 inline format strings in Anthropic streaming

API Conformance

  • Anthropic created_at field — added to both non-streaming message response and streaming message_start event (ISO 8601 UTC)
  • Anthropic max_tokens validation — rejects non-positive integers with 400 error
  • Anthropic streaming event sequence — correct order: message_start -> content_block_start -> ping -> deltas -> content_block_stop -> tool blocks -> message_delta -> message_stop
  • OpenAI X-Accel-Buffering header — added to legacy /completions streaming (was missing, present on other endpoints)
  • OpenAI message edge cases.get() for block text (prevents KeyError), legacy function role support, type validation for content and tool_calls
  • UTF-8 robustness — all decode() calls use errors='replace' to handle corrupted CLI output without crashing

Operational Improvements

  • Multi-worker support--workers N CLI arg; uses uvicorn.run("server:app", workers=N) with env var config propagation via lifespan handler
  • Graceful shutdown — lifespan handler with configurable drain period (CLAUDE_PROXY_DRAIN_SECONDS, default 5s), then SIGTERM -> 5s wait -> SIGKILL
  • Graceful process termination — streaming subprocess cleanup uses SIGTERM -> 3s wait -> SIGKILL instead of immediate SIGKILL
  • Client disconnect detection — streaming generators check request.is_disconnected() every 10 chunks; subprocess killed on disconnect
  • Request ID trackingX-Request-ID header generated per request (or forwarded from client), included in all log messages and response headers
  • Health check with CLI verification/health endpoint runs claude --version (cached 30s), reports active/max concurrent requests, CLI version, worker PID, degraded status
  • Configurable CORSCLAUDE_PROXY_CORS_ORIGINS env var (comma-separated, default *)
  • Configurable log levelCLAUDE_PROXY_LOG_LEVEL env var + --log-level CLI flag
  • Configurable access log--no-access-log flag + CLAUDE_PROXY_ACCESS_LOG env var
  • Request logging middleware — logs method, path, status, latency; streaming-aware (logs "streaming" instead of blocking for elapsed time)
  • Port validation — rejects ports outside 1-65535
  • Non-JSON fallback logging — warns when CLI returns non-JSON output
  • Dynamic model timestamps_STARTUP_TIME replaces hardcoded epoch in model created fields
  • WorkingDirectory in service files — both install.sh template and static claude-proxy.service set WorkingDirectory for multi-worker import resolution

Dependencies

  • Version pinningfastapi>=0.100.0,<1.0.0 and uvicorn>=0.20.0,<1.0.0 (was unbounded >=)

CLI Arguments (New)

Argument Default Description
--timeout 300 CLI timeout in seconds
--max-concurrent 10 Max concurrent CLI requests per worker
--log-level INFO Log level (DEBUG/INFO/WARNING/ERROR)
--no-access-log false Disable uvicorn access log
--workers 1 Number of uvicorn workers

Environment Variables (New)

Variable Default Description
CLAUDE_PROXY_LOG_LEVEL INFO Log level
CLAUDE_PROXY_CORS_ORIGINS * Comma-separated CORS origins
CLAUDE_PROXY_MAX_BODY_BYTES 10485760 Max request body size
CLAUDE_PROXY_ACCESS_LOG true Enable/disable access log
CLAUDE_PROXY_DRAIN_SECONDS 5 Graceful shutdown drain period

[3.0.0] - 2026-04-07

  • Dual API proxy: OpenAI + Anthropic API surfaces
  • Anthropic Messages API (/anthropic/v1/messages) with streaming support
  • Anthropic models and token counting endpoints
  • OpenAI function calling via prompt injection
  • Anthropic tool use via prompt injection
  • Model alias mapping (GPT -> Claude)

[2.0.0] - 2026-04-06

  • Added CLAUDE.md, install.sh, comprehensive README
  • systemd user service support
  • Automated installation script

[1.0.0] - 2026-04-06

  • Initial release: OpenAI-compatible API proxy for Claude Max subscription
  • Chat completions, legacy completions, model listing
  • Streaming support via SSE