Shadow cleanup + Claude OAuth isolation on shadow by HeimaoLST · Pull Request #1 · minervacap2022/CLIProxyAPI

HeimaoLST · 2026-04-27T04:34:57Z

Summary

Shadow is now treated as temporary rollout protection: deploy gains primary_stability and shadow_cleanup phases that observe primary then auto-remove the shadow container.
Per-instance OAuth refresh suppression (oauth-refresh-disabled-providers) lets the shadow process disable Claude auto-refresh without touching shared auth files; Auth.Runtime.RefreshLead()==nil now unschedules the refresh loop.
Live runbook (docs/deployment.yaml) wired with new config knobs and phases; stability probes capped with curl --max-time and management_path is gated when empty.
Warmup characterization test + config.template.yaml note clarify that shadow should disable warmup entirely (empty providers means all supported providers).

Background

Production logs showed invalid_grant on Claude refresh while shadow shared the auth dir read-only. Both instances racing the same refresh token + warmup is the most likely root cause. The runtime change keeps shadow useful as a rollout canary while ensuring it never mutates Claude OAuth state.

Test plan

go test ./internal/config -count=1
go test ./internal/watcher/synthesizer -count=1
go test ./sdk/cliproxy/auth -count=1
go test ./internal/warmup -count=1
go build ./...
On next deploy: confirm shadow gets removed after the stability window and primary keeps refreshing Claude tokens cleanly.

🤖 Generated with Claude Code

…e model namespacing

…oding, normalizeClaudeBudget max_tokens 1. Always include interleaved-thinking-2025-05-14 beta header so that thinking blocks are returned correctly for all Claude models. 2. Remove status-code guard in AMP reverse proxy ModifyResponse so that error responses (4xx/5xx) with hidden gzip encoding are decoded properly — prevents garbled error messages reaching the client. 3. In normalizeClaudeBudget, when the adjusted budget falls below the model minimum, set max_tokens = budgetTokens+1 instead of leaving the request unchanged (which causes a 400 from the API).

When adjustedBudget < minBudget, the previous fix blindly set max_tokens = budgetTokens+1 which could exceed MaxCompletionTokens. Now: cap max_tokens at MaxCompletionTokens, recalculate budget, and disable thinking entirely if constraints are unsatisfiable. Add unit tests covering raise, clamp, disable, and no-op scenarios.

Add Manager.ReconcileRegistryModelStates to clear stale per-model runtime failures for models currently registered in the global model registry. The method finds models supported for an auth, resets non-clean ModelState entries, updates aggregated availability, persists changes, and pushes a snapshot to the scheduler. Introduce modelStateIsClean helper to determine when a model state needs resetting. Call ReconcileRegistryModelStates from Service paths that register/refresh models (applyCoreAuthAddOrUpdate and refreshModelRegistrationForAuth) to keep the scheduler and global registry aligned after model re-registration.

Address two blocking issues from PR review: - Auth file now named vertex-{prefix}-{project}.json so importing the same project with different prefixes no longer overwrites credentials - Prefix containing "/" is rejected at import time instead of being silently ignored at runtime - Add prefix to in-memory metadata map for consistency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Emit signature only when non-empty in both streaming content_block_start and non-streaming thinking blocks. Avoids turning 'missing signature' into 'empty/invalid signature' which Claude clients may reject.

Add ProxyPal (https://github.com/buddingnewinsights/proxypal) to the community projects list in all three README files (EN, CN, JA). Placed after CCS, restoring its original position. ProxyPal is a cross-platform desktop app (macOS, Windows, Linux) that wraps CLIProxyAPI with a native GUI, supporting multiple AI providers, usage analytics, request monitoring, and auto-configuration for popular coding tools. Closes router-for-me#2420

delegate schema sanitization to util.CleanJSONSchemaForGemini and drop the top-level eager_input_streaming key to prevent validation errors when sending claude tools to the gemini api

…ing streaming

基于 Claude Code v2.1.88 源码分析，修复多个可被 Anthropic 检测的差距： - 实现消息指纹算法（SHA256 盐值 + 字符索引），替代随机 buildHash - billing header cc_version 从设备 profile 动态取版本号，不再硬编码 - billing header cc_entrypoint 从客户端 UA 解析，支持 cli/vscode/local-agent - billing header 新增 cc_workload 支持（通过 X-CPA-Claude-Workload 头传入） - 新增 X-Claude-Code-Session-Id 头（每 apiKey 缓存 UUID，TTL=1h） - 新增 x-client-request-id 头（仅 api.anthropic.com，每请求 UUID） - 补全 4 个缺失的 beta flags（structured-outputs/fast-mode/redact-thinking/token-efficient-tools） - OAuth scope 对齐 Claude Code 2.1.88（移除 org:create_api_key，添加 sessions/mcp/file_upload） - Anthropic-Dangerous-Direct-Browser-Access 仅在 API key 模式发送 - 响应头网关指纹清洗（剥离 litellm/helicone/portkey/cloudflare/kong/braintrust 前缀头）

- Add APIKeyConfig and ModelGroup config structs with YAML/JSON support - Build in-memory atomic indexes (apiKeyConfigIndex, modelGroupIndex) for lock-free lookup on the hot request path - AuthMiddleware injects resolved *APIKeyConfig and *ModelGroup into Gin context - CheckModelAccess enforces per-key access rules before execution - Model group failover: resolve group to priority tiers, attempt each tier in descending priority order, fall back on quota exhaustion (429) - Management API endpoints: GET/PATCH/DELETE /v0/management/api-key-configs and /v0/management/model-groups with hot-reload callback - Backward compatible: absent config entries allow all models (existing behavior)

- Introduced new logging functions for websocket requests, handshakes, errors, and responses in `logging_helpers.go`. - Updated `CodexWebsocketsExecutor` to utilize the new logging functions for improved clarity and consistency in websocket operations. - Modified the handling of websocket upgrade rejections to log relevant metadata. - Changed the request body key to a timeline body key in `openai_responses_websocket.go` to better reflect its purpose. - Enhanced tests to verify the correct logging of websocket events and responses, including disconnect events and error handling scenarios.

- Multi-stage Dockerfile: Go build + panel copied from CI - GitHub Actions: builds frontend from Cli-Proxy-API-Management-Center, packages into single image, pushes to ghcr.io/minervacap2022/cliproxyapi - Triggers: push to main, repository_dispatch from frontend repo, manual - Image tags: latest + short SHA - Layer caching via GitHub Actions cache

Claude executor 的 API 请求之前使用 Go 标准库 crypto/tls，JA3 指纹与真实 Claude Code（Bun/BoringSSL）不匹配，可被 Cloudflare 识别。 - 新增 helps/utls_client.go，封装 utls Chrome 指纹 + HTTP/2 + 代理支持 - Claude executor 的 4 处 NewProxyAwareHTTPClient 替换为 NewUtlsHTTPClient - 其他 executor（Gemini/Codex/iFlow 等）不受影响，仍用标准 TLS - 非 HTTPS 请求自动回退到标准 transport

Closes router-for-me#2457

- computeFingerprint 使用 rune 索引替代字节索引，修复多字节字符指纹不匹配 - utls Chrome TLS 指纹仅对 Anthropic 官方域名生效，自定义 base_url 走标准 transport - IPv6 地址使用 net.JoinHostPort 正确拼接端口

This change stops short of broader Claude Code runtime alignment and instead hardens two safe edges: builtin tool prefix handling and source-informed sentinel coverage for future drift checks. Constraint: Must preserve existing default behavior for current users Rejected: Implement control-plane/session alignment now | too much runtime risk for a first slice Confidence: high Scope-risk: narrow Reversibility: clean Directive: Treat the new fixtures as compatibility sentinels, not a full Claude Code schema contract Tested: go test ./test/...; go test ./sdk/translator/...; go test ./internal/runtime/executor -run 'Claude|Builtin|Tool'; go test ./... Not-tested: End-to-end Claude Code direct-connect/session runtime behavior

Line-oriented upstream executors can emit `event:` and `data:` as separate chunks, but the Responses handler had started terminating each incoming chunk as a full SSE event. That split `response.created` into an empty event plus a later data block, which broke downstream clients like OpenClaw. This keeps the fix in the handler layer: a small stateful framer now buffers standalone `event:` lines until the matching `data:` arrives, preserves already-framed events, and ignores delimiter-only leftovers. The regression suite now covers split event/data framing, full-event passthrough, terminal errors, and the bootstrap path that forwards line-oriented openai-response streams from non-Codex executors too. Constraint: Keep the fix localized to Responses handler framing instead of patching every executor Rejected: Revert to v6.9.7 chunk writing | would reintroduce data-only framing regressions Rejected: Patch each line-oriented executor separately | duplicates fragile SSE assembly logic Confidence: high Scope-risk: narrow Reversibility: clean Directive: Do not assume incoming Responses stream chunks are already complete SSE events; preserve handler-layer reassembly for split `event:`/`data:` inputs Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers/openai -count=1 Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers -count=1 Tested: /tmp/go1.26.1/go test ./sdk/api/handlers/... -count=1 Tested: /tmp/go1.26.1/go/bin/go vet ./sdk/api/handlers/... Tested: Temporary patched server on 127.0.0.1:18317 -> /v1/models 200, /v1/responses non-stream 200, /v1/responses stream emitted combined `event:` + `data:` frames Not-tested: Full repository test suite outside sdk/api/handlers packages

Follow-up review found two real framing hazards in the handler-layer framer: it could flush a partial `data:` payload before the JSON was complete, and it could inject an extra newline before chunks that already began with `\n`/`\r\n`. This commit tightens the framer so it only emits undelimited events when the buffered `data:` payload is already valid JSON (or `[DONE]`), skips newline injection for chunks that already start with a line break, and avoids the heavier `bytes.Split` path while scanning SSE fields. The regression suite now covers split `data:` payload chunks, newline-prefixed chunks, and dropping incomplete trailing data on flush, so the original Responses fix remains intact while the review concerns are explicitly locked down. Constraint: Keep the follow-up limited to handler-layer framing and tests Rejected: Ignore the review and rely on current executor chunk shapes | leaves partial data payload corruption possible Rejected: Build a fully generic SSE parser | wider change than needed for the identified risks Confidence: high Scope-risk: narrow Reversibility: clean Directive: Do not emit undelimited Responses SSE events unless buffered `data:` content is already complete and valid Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers/openai -count=1 Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers -count=1 Tested: /tmp/go1.26.1/go/bin/go vet ./sdk/api/handlers/... Not-tested: Full repository test suite outside sdk/api/handlers packages

… filter Symptom: warmup trigger log showed warmup failed provider=claude error="auth_not_found: no auth available" even when the pinned auth was clearly present. Root cause: Scheduler called manager.Execute(providers, req, opts) with the auth ID pinned in metadata. manager.pickNextLegacy filters candidates by authSupportsRouteModel — i.e. the auth must be registered in the model registry for the requested model. Operators who restrict their Claude auth to a custom model list (e.g. only Sonnet variants) had no entry for the warmup recipe's claude-haiku-4-5, so the candidate list was empty. Fix: Warmup fetches the provider executor via Manager.Executor(provider) and calls executor.Execute(ctx, auth, req, opts) directly. This is correct because warmup always targets a specific OAuth auth with a known-safe minimal body; we don't need selector/quota/registry filtering. For Claude OAuth specifically, any Claude model is callable at the Anthropic API regardless of local registry settings. Interface change: scheduler.Executor now requires Executor(provider) rather than Execute(providers, req, opts). Tests updated to provide a minimal ProviderExecutor stub.

Rationale: the degenerate max_tokens=1 + "." payload successfully reaches the provider API, but we do not have Anthropic-side confirmation that it reliably opens the 5-hour Claude Max session window. Some session-window systems only start counting once a non-trivial completion has actually been generated. Beefier payload: - content "ping" instead of "." — reads as normal greeting traffic - max_tokens / max_output_tokens / maxOutputTokens = 16 — gives the model room to actually produce a reply rather than an immediate stop - cost impact on Haiku / Flash-Lite tiers is negligible (sub-cent per warmup round), well below the benefit of a deterministic window open

…aude Claude Code stores thinking blocks returned from non-Claude providers (Kimi, OpenAI-compatible) that the response translator emits without signatures. When the user switches back to an Anthropic model, those unsigned blocks are replayed to the upstream API, which rejects the request with "Invalid \`signature\` in \`thinking\` block". Lift SanitizeAmpRequestBody's core logic into internal/thinking as SanitizeMessagesThinking, have amp delegate to it, and invoke it in ClaudeExecutor.Execute / ExecuteStream right after thinking.ApplyThinking so every path targeting Anthropic gets the cleanup.

Adds two httptest-based cases on ClaudeExecutor.Execute that capture the actual upstream body: - StripsUnsignedThinkingBlocks: unsigned thinking block (Kimi shape) is removed before reaching Anthropic, surrounding turns are preserved. - PreservesSignedThinkingBlocks: a properly-signed thinking block is forwarded unchanged so multi-turn extended thinking keeps working. Together with the function-level tests in internal/thinking, this guards against both regressions: losing the sanitize call site, and over-stripping legitimate signed thinking.

Single-file YAML spec describing the shadow→confirm→primary rollout, startup-param snapshotting, rehydration, and automated health/interface checks. Consumable by an agent (Claude-level): all helper logic lives in anchored bash blocks in the YAML itself, no /bin/ scripts. Used today to ship 77c2992 onto primary/shadow. Also add `!docs/deployment.yaml` to .gitignore so future edits are tracked (the rest of docs/ remains ignored as before).

- Refactored `/healthz` handler to support `HEAD` requests alongside `GET`. - Updated tests to include validation for `HEAD` requests with expected status and empty body. Closes: router-for-me#2929

…m-output-backfill fix(codex): backfill streaming response output

…t-host-header fix(util): forward custom Host header to upstream

…Anthropic The codex→claude response translator writes Codex's `encrypted_content` (Fernet tokens, always prefixed with "gAAAAAB") into Claude thinking block `signature`. When a client replays that history back through the proxy targeting an Anthropic model, Anthropic rejects the request with "Invalid `signature` in `thinking` block" because Fernet is not an Anthropic signature format. Extend SanitizeMessagesThinking to drop any thinking block whose signature begins with the Fernet version marker. Covers the GPT→Sonnet failover path end-to-end (proxy forwards to Codex → returns Fernet sig → stored in client history → next Claude turn is sanitized before reaching Anthropic). Genuine Anthropic signatures, which never start with "gAAAAAB", pass through untouched so multi-turn extended thinking keeps working.

Two changes targeting overnight OAuth expiration: 1. ClaudeExecutor.Refresh now uses RefreshTokensWithRetry(ctx, token, 3) instead of single-shot RefreshTokens, matching the Codex executor. A transient Anthropic OAuth 5xx / network blip no longer sinks the auth straight into the 5-minute refreshFailureBackoff loop, which could accumulate misses and let the 4h refresh window slip past the 8h access_token lifetime. 2. Conductor's refresh outcomes move from log.Debugf to structured Info (success / canceled) and Warn (failure) records with provider and auth_id fields. Previously every refresh happened silently at default log level, making "why did my OAuth expire overnight" unanswerable without flipping the entire service to debug.

- Added `GPT-Image-2` as a built-in model to avoid dependency on remote updates for Codex. - Updated model tier functions (`CodexFree`, `CodexTeam`, etc.) to include built-in models via `WithCodexBuiltins`. - Introduced new handlers for image generation and edit operations under `OpenAIAPIHandler`. - Extended tests to validate 503 response for unsupported image model requests.

… image handlers

…AI image handlers

Both persist() and the refreshAuth success path previously swallowed store.Save failures without any warning. This made it impossible to distinguish a successful token refresh from one where the rotated token was never written to disk. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ment_path Add curl --max-time so a hung primary can't stretch the stability window into a minutes-long block. Skip the management_path probe when the field is empty so deployments without a management panel don't fail every iteration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-27T04:35:09Z

This pull request targeted main.

The base branch has been automatically changed to dev.

dinhkarate and others added 30 commits January 29, 2026 13:32

feat(vertex): Add Prefix field to VertexCredentialStorage for per-fil…

fdeef48

…e model namespacing

feat(vertex): add --vertex-import-prefix flag for model namespacing

14cb2b9

fix(auth): tighten registry model reconciliation

f09ed25

chore(auth): drop reconcile test file from pr

e08f68e

Merge branch 'router-for-me:main' into feat(vertex)/add-prefix-field

a337ecf

fix: preserve Claude thinking signatures in Codex translator

a34dfed

fix: finalize pending thinking block before next summary part

76b53d6

fix: retain previously captured thinking signature on new summary part

c31ae2f

fix: omit empty signature field from thinking blocks

73b22ec

Emit signature only when non-empty in both streaming content_block_start and non-streaming thinking blocks. Avoids turning 'missing signature' into 'empty/invalid signature' which Claude clients may reject.

fix: clear stale thinking signature when no block is open

66eb122

fix: retain codex thinking signature until item done

5fc2bd3

fix(gemini): clean tool schemas and eager_input_streaming

e34b2b4

delegate schema sanitization to util.CleanJSONSchemaForGemini and drop the top-level eager_input_streaming key to prevent validation errors when sending claude tools to the gemini api

test(amp): update tests to expect thinking blocks to pass through dur…

ff7dbb5

…ing streaming

test(amp): update tests to expect thinking blocks to pass through dur…

f5e9f01

…ing streaming

refactor(logging): centralize websocket handshake recording

4f8acec

feat(auth): add support for managing custom headers in auth files

09e4800

Closes router-for-me#2457

BotHank-309 and others added 26 commits April 20, 2026 08:19

fix(codex): backfill streaming response output

bb8408c

perf(codex): avoid repeated output patch writes

b6781d6

feat(api): add support for HEAD requests to /healthz endpoint

1716a84

- Refactored `/healthz` handler to support `HEAD` requests alongside `GET`. - Updated tests to include validation for `HEAD` requests with expected status and empty body. Closes: router-for-me#2929

Merge pull request router-for-me#2939 from stringer07/fix/codex-strea…

3444820

…m-output-backfill fix(codex): backfill streaming response output

Merge pull request router-for-me#2834 from muzhi1991/fix/openai-compa…

8ced7a5

…t-host-header fix(util): forward custom Host header to upstream

feat(models): add Kimi K2.6 model entry to registry JSON

4fc2c61

Merge branch 'upstream-main' into main

39c6fda

fix(handlers): remove handling of unsupported n parameter in OpenAI…

fd71960

… image handlers

fix(handlers): remove references to unsupported n parameter in Open…

a188159

…AI image handlers

Merge remote-tracking branch 'upstream/main'

ce6b47c

feat(config): add per-instance oauth refresh suppression

cdba083

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(auth): support runtime refresh suppression

b6d1f84

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(watcher): suppress oauth refresh per instance

23e7ecd

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

test(auth): cover refresh suppression scheduling

042d0c0

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs(warmup): clarify shadow provider allowlist behavior

6c36919

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(runbook): auto-remove shadow after primary stability

3892b9c

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions Bot changed the base branch from main to dev April 27, 2026 04:35

HeimaoLST merged commit 59a737c into dev Apr 27, 2026
4 checks passed

HeimaoLST deleted the shadow-cleanup-oauth-isolation branch April 27, 2026 05:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shadow cleanup + Claude OAuth isolation on shadow#1

Shadow cleanup + Claude OAuth isolation on shadow#1
HeimaoLST merged 221 commits into
devfrom
shadow-cleanup-oauth-isolation

HeimaoLST commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

HeimaoLST commented Apr 27, 2026

Summary

Background

Test plan

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants