Shadow cleanup + Claude OAuth isolation on shadow#1
Merged
Conversation
…e model namespacing
…oding, normalizeClaudeBudget max_tokens 1. Always include interleaved-thinking-2025-05-14 beta header so that thinking blocks are returned correctly for all Claude models. 2. Remove status-code guard in AMP reverse proxy ModifyResponse so that error responses (4xx/5xx) with hidden gzip encoding are decoded properly — prevents garbled error messages reaching the client. 3. In normalizeClaudeBudget, when the adjusted budget falls below the model minimum, set max_tokens = budgetTokens+1 instead of leaving the request unchanged (which causes a 400 from the API).
When adjustedBudget < minBudget, the previous fix blindly set max_tokens = budgetTokens+1 which could exceed MaxCompletionTokens. Now: cap max_tokens at MaxCompletionTokens, recalculate budget, and disable thinking entirely if constraints are unsatisfiable. Add unit tests covering raise, clamp, disable, and no-op scenarios.
Add Manager.ReconcileRegistryModelStates to clear stale per-model runtime failures for models currently registered in the global model registry. The method finds models supported for an auth, resets non-clean ModelState entries, updates aggregated availability, persists changes, and pushes a snapshot to the scheduler. Introduce modelStateIsClean helper to determine when a model state needs resetting. Call ReconcileRegistryModelStates from Service paths that register/refresh models (applyCoreAuthAddOrUpdate and refreshModelRegistrationForAuth) to keep the scheduler and global registry aligned after model re-registration.
Address two blocking issues from PR review:
- Auth file now named vertex-{prefix}-{project}.json so importing the
same project with different prefixes no longer overwrites credentials
- Prefix containing "/" is rejected at import time instead of being
silently ignored at runtime
- Add prefix to in-memory metadata map for consistency
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Emit signature only when non-empty in both streaming content_block_start and non-streaming thinking blocks. Avoids turning 'missing signature' into 'empty/invalid signature' which Claude clients may reject.
Add ProxyPal (https://github.com/buddingnewinsights/proxypal) to the community projects list in all three README files (EN, CN, JA). Placed after CCS, restoring its original position. ProxyPal is a cross-platform desktop app (macOS, Windows, Linux) that wraps CLIProxyAPI with a native GUI, supporting multiple AI providers, usage analytics, request monitoring, and auto-configuration for popular coding tools. Closes router-for-me#2420
delegate schema sanitization to util.CleanJSONSchemaForGemini and drop the top-level eager_input_streaming key to prevent validation errors when sending claude tools to the gemini api
基于 Claude Code v2.1.88 源码分析,修复多个可被 Anthropic 检测的差距: - 实现消息指纹算法(SHA256 盐值 + 字符索引),替代随机 buildHash - billing header cc_version 从设备 profile 动态取版本号,不再硬编码 - billing header cc_entrypoint 从客户端 UA 解析,支持 cli/vscode/local-agent - billing header 新增 cc_workload 支持(通过 X-CPA-Claude-Workload 头传入) - 新增 X-Claude-Code-Session-Id 头(每 apiKey 缓存 UUID,TTL=1h) - 新增 x-client-request-id 头(仅 api.anthropic.com,每请求 UUID) - 补全 4 个缺失的 beta flags(structured-outputs/fast-mode/redact-thinking/token-efficient-tools) - OAuth scope 对齐 Claude Code 2.1.88(移除 org:create_api_key,添加 sessions/mcp/file_upload) - Anthropic-Dangerous-Direct-Browser-Access 仅在 API key 模式发送 - 响应头网关指纹清洗(剥离 litellm/helicone/portkey/cloudflare/kong/braintrust 前缀头)
- Add APIKeyConfig and ModelGroup config structs with YAML/JSON support - Build in-memory atomic indexes (apiKeyConfigIndex, modelGroupIndex) for lock-free lookup on the hot request path - AuthMiddleware injects resolved *APIKeyConfig and *ModelGroup into Gin context - CheckModelAccess enforces per-key access rules before execution - Model group failover: resolve group to priority tiers, attempt each tier in descending priority order, fall back on quota exhaustion (429) - Management API endpoints: GET/PATCH/DELETE /v0/management/api-key-configs and /v0/management/model-groups with hot-reload callback - Backward compatible: absent config entries allow all models (existing behavior)
- Introduced new logging functions for websocket requests, handshakes, errors, and responses in `logging_helpers.go`. - Updated `CodexWebsocketsExecutor` to utilize the new logging functions for improved clarity and consistency in websocket operations. - Modified the handling of websocket upgrade rejections to log relevant metadata. - Changed the request body key to a timeline body key in `openai_responses_websocket.go` to better reflect its purpose. - Enhanced tests to verify the correct logging of websocket events and responses, including disconnect events and error handling scenarios.
- Multi-stage Dockerfile: Go build + panel copied from CI - GitHub Actions: builds frontend from Cli-Proxy-API-Management-Center, packages into single image, pushes to ghcr.io/minervacap2022/cliproxyapi - Triggers: push to main, repository_dispatch from frontend repo, manual - Image tags: latest + short SHA - Layer caching via GitHub Actions cache
Claude executor 的 API 请求之前使用 Go 标准库 crypto/tls,JA3 指纹 与真实 Claude Code(Bun/BoringSSL)不匹配,可被 Cloudflare 识别。 - 新增 helps/utls_client.go,封装 utls Chrome 指纹 + HTTP/2 + 代理支持 - Claude executor 的 4 处 NewProxyAwareHTTPClient 替换为 NewUtlsHTTPClient - 其他 executor(Gemini/Codex/iFlow 等)不受影响,仍用标准 TLS - 非 HTTPS 请求自动回退到标准 transport
- computeFingerprint 使用 rune 索引替代字节索引,修复多字节字符指纹不匹配 - utls Chrome TLS 指纹仅对 Anthropic 官方域名生效,自定义 base_url 走标准 transport - IPv6 地址使用 net.JoinHostPort 正确拼接端口
This change stops short of broader Claude Code runtime alignment and instead hardens two safe edges: builtin tool prefix handling and source-informed sentinel coverage for future drift checks. Constraint: Must preserve existing default behavior for current users Rejected: Implement control-plane/session alignment now | too much runtime risk for a first slice Confidence: high Scope-risk: narrow Reversibility: clean Directive: Treat the new fixtures as compatibility sentinels, not a full Claude Code schema contract Tested: go test ./test/...; go test ./sdk/translator/...; go test ./internal/runtime/executor -run 'Claude|Builtin|Tool'; go test ./... Not-tested: End-to-end Claude Code direct-connect/session runtime behavior
Line-oriented upstream executors can emit `event:` and `data:` as separate chunks, but the Responses handler had started terminating each incoming chunk as a full SSE event. That split `response.created` into an empty event plus a later data block, which broke downstream clients like OpenClaw. This keeps the fix in the handler layer: a small stateful framer now buffers standalone `event:` lines until the matching `data:` arrives, preserves already-framed events, and ignores delimiter-only leftovers. The regression suite now covers split event/data framing, full-event passthrough, terminal errors, and the bootstrap path that forwards line-oriented openai-response streams from non-Codex executors too. Constraint: Keep the fix localized to Responses handler framing instead of patching every executor Rejected: Revert to v6.9.7 chunk writing | would reintroduce data-only framing regressions Rejected: Patch each line-oriented executor separately | duplicates fragile SSE assembly logic Confidence: high Scope-risk: narrow Reversibility: clean Directive: Do not assume incoming Responses stream chunks are already complete SSE events; preserve handler-layer reassembly for split `event:`/`data:` inputs Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers/openai -count=1 Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers -count=1 Tested: /tmp/go1.26.1/go test ./sdk/api/handlers/... -count=1 Tested: /tmp/go1.26.1/go/bin/go vet ./sdk/api/handlers/... Tested: Temporary patched server on 127.0.0.1:18317 -> /v1/models 200, /v1/responses non-stream 200, /v1/responses stream emitted combined `event:` + `data:` frames Not-tested: Full repository test suite outside sdk/api/handlers packages
Follow-up review found two real framing hazards in the handler-layer framer: it could flush a partial `data:` payload before the JSON was complete, and it could inject an extra newline before chunks that already began with `\n`/`\r\n`. This commit tightens the framer so it only emits undelimited events when the buffered `data:` payload is already valid JSON (or `[DONE]`), skips newline injection for chunks that already start with a line break, and avoids the heavier `bytes.Split` path while scanning SSE fields. The regression suite now covers split `data:` payload chunks, newline-prefixed chunks, and dropping incomplete trailing data on flush, so the original Responses fix remains intact while the review concerns are explicitly locked down. Constraint: Keep the follow-up limited to handler-layer framing and tests Rejected: Ignore the review and rely on current executor chunk shapes | leaves partial data payload corruption possible Rejected: Build a fully generic SSE parser | wider change than needed for the identified risks Confidence: high Scope-risk: narrow Reversibility: clean Directive: Do not emit undelimited Responses SSE events unless buffered `data:` content is already complete and valid Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers/openai -count=1 Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers -count=1 Tested: /tmp/go1.26.1/go/bin/go vet ./sdk/api/handlers/... Not-tested: Full repository test suite outside sdk/api/handlers packages
… filter Symptom: warmup trigger log showed warmup failed provider=claude error="auth_not_found: no auth available" even when the pinned auth was clearly present. Root cause: Scheduler called manager.Execute(providers, req, opts) with the auth ID pinned in metadata. manager.pickNextLegacy filters candidates by authSupportsRouteModel — i.e. the auth must be registered in the model registry for the requested model. Operators who restrict their Claude auth to a custom model list (e.g. only Sonnet variants) had no entry for the warmup recipe's claude-haiku-4-5, so the candidate list was empty. Fix: Warmup fetches the provider executor via Manager.Executor(provider) and calls executor.Execute(ctx, auth, req, opts) directly. This is correct because warmup always targets a specific OAuth auth with a known-safe minimal body; we don't need selector/quota/registry filtering. For Claude OAuth specifically, any Claude model is callable at the Anthropic API regardless of local registry settings. Interface change: scheduler.Executor now requires Executor(provider) rather than Execute(providers, req, opts). Tests updated to provide a minimal ProviderExecutor stub.
Rationale: the degenerate max_tokens=1 + "." payload successfully reaches
the provider API, but we do not have Anthropic-side confirmation that it
reliably opens the 5-hour Claude Max session window. Some session-window
systems only start counting once a non-trivial completion has actually
been generated.
Beefier payload:
- content "ping" instead of "." — reads as normal greeting traffic
- max_tokens / max_output_tokens / maxOutputTokens = 16 — gives the model
room to actually produce a reply rather than an immediate stop
- cost impact on Haiku / Flash-Lite tiers is negligible (sub-cent per
warmup round), well below the benefit of a deterministic window open
…aude Claude Code stores thinking blocks returned from non-Claude providers (Kimi, OpenAI-compatible) that the response translator emits without signatures. When the user switches back to an Anthropic model, those unsigned blocks are replayed to the upstream API, which rejects the request with "Invalid \`signature\` in \`thinking\` block". Lift SanitizeAmpRequestBody's core logic into internal/thinking as SanitizeMessagesThinking, have amp delegate to it, and invoke it in ClaudeExecutor.Execute / ExecuteStream right after thinking.ApplyThinking so every path targeting Anthropic gets the cleanup.
Adds two httptest-based cases on ClaudeExecutor.Execute that capture the actual upstream body: - StripsUnsignedThinkingBlocks: unsigned thinking block (Kimi shape) is removed before reaching Anthropic, surrounding turns are preserved. - PreservesSignedThinkingBlocks: a properly-signed thinking block is forwarded unchanged so multi-turn extended thinking keeps working. Together with the function-level tests in internal/thinking, this guards against both regressions: losing the sanitize call site, and over-stripping legitimate signed thinking.
Single-file YAML spec describing the shadow→confirm→primary rollout, startup-param snapshotting, rehydration, and automated health/interface checks. Consumable by an agent (Claude-level): all helper logic lives in anchored bash blocks in the YAML itself, no /bin/ scripts. Used today to ship 77c2992 onto primary/shadow. Also add `!docs/deployment.yaml` to .gitignore so future edits are tracked (the rest of docs/ remains ignored as before).
- Refactored `/healthz` handler to support `HEAD` requests alongside `GET`. - Updated tests to include validation for `HEAD` requests with expected status and empty body. Closes: router-for-me#2929
…m-output-backfill fix(codex): backfill streaming response output
…t-host-header fix(util): forward custom Host header to upstream
…Anthropic The codex→claude response translator writes Codex's `encrypted_content` (Fernet tokens, always prefixed with "gAAAAAB") into Claude thinking block `signature`. When a client replays that history back through the proxy targeting an Anthropic model, Anthropic rejects the request with "Invalid `signature` in `thinking` block" because Fernet is not an Anthropic signature format. Extend SanitizeMessagesThinking to drop any thinking block whose signature begins with the Fernet version marker. Covers the GPT→Sonnet failover path end-to-end (proxy forwards to Codex → returns Fernet sig → stored in client history → next Claude turn is sanitized before reaching Anthropic). Genuine Anthropic signatures, which never start with "gAAAAAB", pass through untouched so multi-turn extended thinking keeps working.
Two changes targeting overnight OAuth expiration: 1. ClaudeExecutor.Refresh now uses RefreshTokensWithRetry(ctx, token, 3) instead of single-shot RefreshTokens, matching the Codex executor. A transient Anthropic OAuth 5xx / network blip no longer sinks the auth straight into the 5-minute refreshFailureBackoff loop, which could accumulate misses and let the 4h refresh window slip past the 8h access_token lifetime. 2. Conductor's refresh outcomes move from log.Debugf to structured Info (success / canceled) and Warn (failure) records with provider and auth_id fields. Previously every refresh happened silently at default log level, making "why did my OAuth expire overnight" unanswerable without flipping the entire service to debug.
- Added `GPT-Image-2` as a built-in model to avoid dependency on remote updates for Codex. - Updated model tier functions (`CodexFree`, `CodexTeam`, etc.) to include built-in models via `WithCodexBuiltins`. - Introduced new handlers for image generation and edit operations under `OpenAIAPIHandler`. - Extended tests to validate 503 response for unsupported image model requests.
…AI image handlers
Both persist() and the refreshAuth success path previously swallowed store.Save failures without any warning. This made it impossible to distinguish a successful token refresh from one where the rotated token was never written to disk. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ment_path Add curl --max-time so a hung primary can't stretch the stability window into a minutes-long block. Skip the management_path probe when the field is empty so deployments without a management panel don't fail every iteration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This pull request targeted The base branch has been automatically changed to |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
primary_stabilityandshadow_cleanupphases that observe primary then auto-remove the shadow container.oauth-refresh-disabled-providers) lets the shadow process disable Claude auto-refresh without touching shared auth files;Auth.Runtime.RefreshLead()==nilnow unschedules the refresh loop.docs/deployment.yaml) wired with new config knobs and phases; stability probes capped withcurl --max-timeandmanagement_pathis gated when empty.config.template.yamlnote clarify that shadow should disable warmup entirely (empty providers means all supported providers).Background
Production logs showed
invalid_granton Claude refresh while shadow shared the auth dir read-only. Both instances racing the same refresh token + warmup is the most likely root cause. The runtime change keeps shadow useful as a rollout canary while ensuring it never mutates Claude OAuth state.Test plan
go test ./internal/config -count=1go test ./internal/watcher/synthesizer -count=1go test ./sdk/cliproxy/auth -count=1go test ./internal/warmup -count=1go build ./...🤖 Generated with Claude Code