Skip to content

Shadow cleanup + Claude OAuth isolation on shadow#1

Merged
HeimaoLST merged 221 commits into
devfrom
shadow-cleanup-oauth-isolation
Apr 27, 2026
Merged

Shadow cleanup + Claude OAuth isolation on shadow#1
HeimaoLST merged 221 commits into
devfrom
shadow-cleanup-oauth-isolation

Conversation

@HeimaoLST
Copy link
Copy Markdown

Summary

  • Shadow is now treated as temporary rollout protection: deploy gains primary_stability and shadow_cleanup phases that observe primary then auto-remove the shadow container.
  • Per-instance OAuth refresh suppression (oauth-refresh-disabled-providers) lets the shadow process disable Claude auto-refresh without touching shared auth files; Auth.Runtime.RefreshLead()==nil now unschedules the refresh loop.
  • Live runbook (docs/deployment.yaml) wired with new config knobs and phases; stability probes capped with curl --max-time and management_path is gated when empty.
  • Warmup characterization test + config.template.yaml note clarify that shadow should disable warmup entirely (empty providers means all supported providers).

Background

Production logs showed invalid_grant on Claude refresh while shadow shared the auth dir read-only. Both instances racing the same refresh token + warmup is the most likely root cause. The runtime change keeps shadow useful as a rollout canary while ensuring it never mutates Claude OAuth state.

Test plan

  • go test ./internal/config -count=1
  • go test ./internal/watcher/synthesizer -count=1
  • go test ./sdk/cliproxy/auth -count=1
  • go test ./internal/warmup -count=1
  • go build ./...
  • On next deploy: confirm shadow gets removed after the stability window and primary keeps refreshing Claude tokens cleanly.

🤖 Generated with Claude Code

dinhkarate and others added 30 commits January 29, 2026 13:32
…oding, normalizeClaudeBudget max_tokens

1. Always include interleaved-thinking-2025-05-14 beta header so that
   thinking blocks are returned correctly for all Claude models.

2. Remove status-code guard in AMP reverse proxy ModifyResponse so that
   error responses (4xx/5xx) with hidden gzip encoding are decoded
   properly — prevents garbled error messages reaching the client.

3. In normalizeClaudeBudget, when the adjusted budget falls below the
   model minimum, set max_tokens = budgetTokens+1 instead of leaving
   the request unchanged (which causes a 400 from the API).
When adjustedBudget < minBudget, the previous fix blindly set
max_tokens = budgetTokens+1 which could exceed MaxCompletionTokens.

Now: cap max_tokens at MaxCompletionTokens, recalculate budget, and
disable thinking entirely if constraints are unsatisfiable.

Add unit tests covering raise, clamp, disable, and no-op scenarios.
Add Manager.ReconcileRegistryModelStates to clear stale per-model runtime failures for models currently registered in the global model registry. The method finds models supported for an auth, resets non-clean ModelState entries, updates aggregated availability, persists changes, and pushes a snapshot to the scheduler. Introduce modelStateIsClean helper to determine when a model state needs resetting. Call ReconcileRegistryModelStates from Service paths that register/refresh models (applyCoreAuthAddOrUpdate and refreshModelRegistrationForAuth) to keep the scheduler and global registry aligned after model re-registration.
Address two blocking issues from PR review:
- Auth file now named vertex-{prefix}-{project}.json so importing the
  same project with different prefixes no longer overwrites credentials
- Prefix containing "/" is rejected at import time instead of being
  silently ignored at runtime
- Add prefix to in-memory metadata map for consistency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Emit signature only when non-empty in both streaming content_block_start
and non-streaming thinking blocks. Avoids turning 'missing signature'
into 'empty/invalid signature' which Claude clients may reject.
Add ProxyPal (https://github.com/buddingnewinsights/proxypal) to the
community projects list in all three README files (EN, CN, JA).
Placed after CCS, restoring its original position.

ProxyPal is a cross-platform desktop app (macOS, Windows, Linux) that
wraps CLIProxyAPI with a native GUI, supporting multiple AI providers,
usage analytics, request monitoring, and auto-configuration for popular
coding tools.

Closes router-for-me#2420
delegate schema sanitization to util.CleanJSONSchemaForGemini and drop the top-level eager_input_streaming key to prevent validation errors when sending claude tools to the gemini api
基于 Claude Code v2.1.88 源码分析,修复多个可被 Anthropic 检测的差距:

- 实现消息指纹算法(SHA256 盐值 + 字符索引),替代随机 buildHash
- billing header cc_version 从设备 profile 动态取版本号,不再硬编码
- billing header cc_entrypoint 从客户端 UA 解析,支持 cli/vscode/local-agent
- billing header 新增 cc_workload 支持(通过 X-CPA-Claude-Workload 头传入)
- 新增 X-Claude-Code-Session-Id 头(每 apiKey 缓存 UUID,TTL=1h)
- 新增 x-client-request-id 头(仅 api.anthropic.com,每请求 UUID)
- 补全 4 个缺失的 beta flags(structured-outputs/fast-mode/redact-thinking/token-efficient-tools)
- OAuth scope 对齐 Claude Code 2.1.88(移除 org:create_api_key,添加 sessions/mcp/file_upload)
- Anthropic-Dangerous-Direct-Browser-Access 仅在 API key 模式发送
- 响应头网关指纹清洗(剥离 litellm/helicone/portkey/cloudflare/kong/braintrust 前缀头)
- Add APIKeyConfig and ModelGroup config structs with YAML/JSON support
- Build in-memory atomic indexes (apiKeyConfigIndex, modelGroupIndex) for
  lock-free lookup on the hot request path
- AuthMiddleware injects resolved *APIKeyConfig and *ModelGroup into Gin context
- CheckModelAccess enforces per-key access rules before execution
- Model group failover: resolve group to priority tiers, attempt each tier
  in descending priority order, fall back on quota exhaustion (429)
- Management API endpoints: GET/PATCH/DELETE /v0/management/api-key-configs
  and /v0/management/model-groups with hot-reload callback
- Backward compatible: absent config entries allow all models (existing behavior)
- Introduced new logging functions for websocket requests, handshakes, errors, and responses in `logging_helpers.go`.
- Updated `CodexWebsocketsExecutor` to utilize the new logging functions for improved clarity and consistency in websocket operations.
- Modified the handling of websocket upgrade rejections to log relevant metadata.
- Changed the request body key to a timeline body key in `openai_responses_websocket.go` to better reflect its purpose.
- Enhanced tests to verify the correct logging of websocket events and responses, including disconnect events and error handling scenarios.
- Multi-stage Dockerfile: Go build + panel copied from CI
- GitHub Actions: builds frontend from Cli-Proxy-API-Management-Center,
  packages into single image, pushes to ghcr.io/minervacap2022/cliproxyapi
- Triggers: push to main, repository_dispatch from frontend repo, manual
- Image tags: latest + short SHA
- Layer caching via GitHub Actions cache
Claude executor 的 API 请求之前使用 Go 标准库 crypto/tls,JA3 指纹
与真实 Claude Code(Bun/BoringSSL)不匹配,可被 Cloudflare 识别。

- 新增 helps/utls_client.go,封装 utls Chrome 指纹 + HTTP/2 + 代理支持
- Claude executor 的 4 处 NewProxyAwareHTTPClient 替换为 NewUtlsHTTPClient
- 其他 executor(Gemini/Codex/iFlow 等)不受影响,仍用标准 TLS
- 非 HTTPS 请求自动回退到标准 transport
- computeFingerprint 使用 rune 索引替代字节索引,修复多字节字符指纹不匹配
- utls Chrome TLS 指纹仅对 Anthropic 官方域名生效,自定义 base_url 走标准 transport
- IPv6 地址使用 net.JoinHostPort 正确拼接端口
This change stops short of broader Claude Code runtime alignment and instead
hardens two safe edges: builtin tool prefix handling and source-informed
sentinel coverage for future drift checks.

Constraint: Must preserve existing default behavior for current users
Rejected: Implement control-plane/session alignment now | too much runtime risk for a first slice
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Treat the new fixtures as compatibility sentinels, not a full Claude Code schema contract
Tested: go test ./test/...; go test ./sdk/translator/...; go test ./internal/runtime/executor -run 'Claude|Builtin|Tool'; go test ./...
Not-tested: End-to-end Claude Code direct-connect/session runtime behavior
Line-oriented upstream executors can emit `event:` and `data:` as
separate chunks, but the Responses handler had started terminating
each incoming chunk as a full SSE event. That split `response.created`
into an empty event plus a later data block, which broke downstream
clients like OpenClaw.

This keeps the fix in the handler layer: a small stateful framer now
buffers standalone `event:` lines until the matching `data:` arrives,
preserves already-framed events, and ignores delimiter-only leftovers.
The regression suite now covers split event/data framing, full-event
passthrough, terminal errors, and the bootstrap path that forwards
line-oriented openai-response streams from non-Codex executors too.

Constraint: Keep the fix localized to Responses handler framing instead of patching every executor
Rejected: Revert to v6.9.7 chunk writing | would reintroduce data-only framing regressions
Rejected: Patch each line-oriented executor separately | duplicates fragile SSE assembly logic
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Do not assume incoming Responses stream chunks are already complete SSE events; preserve handler-layer reassembly for split `event:`/`data:` inputs
Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers/openai -count=1
Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers -count=1
Tested: /tmp/go1.26.1/go test ./sdk/api/handlers/... -count=1
Tested: /tmp/go1.26.1/go/bin/go vet ./sdk/api/handlers/...
Tested: Temporary patched server on 127.0.0.1:18317 -> /v1/models 200, /v1/responses non-stream 200, /v1/responses stream emitted combined `event:` + `data:` frames
Not-tested: Full repository test suite outside sdk/api/handlers packages
Follow-up review found two real framing hazards in the handler-layer
framer: it could flush a partial `data:` payload before the JSON was
complete, and it could inject an extra newline before chunks that
already began with `\n`/`\r\n`. This commit tightens the framer so it
only emits undelimited events when the buffered `data:` payload is
already valid JSON (or `[DONE]`), skips newline injection for chunks
that already start with a line break, and avoids the heavier
`bytes.Split` path while scanning SSE fields.

The regression suite now covers split `data:` payload chunks,
newline-prefixed chunks, and dropping incomplete trailing data on
flush, so the original Responses fix remains intact while the review
concerns are explicitly locked down.

Constraint: Keep the follow-up limited to handler-layer framing and tests
Rejected: Ignore the review and rely on current executor chunk shapes | leaves partial data payload corruption possible
Rejected: Build a fully generic SSE parser | wider change than needed for the identified risks
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Do not emit undelimited Responses SSE events unless buffered `data:` content is already complete and valid
Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers/openai -count=1
Tested: /tmp/go1.26.1/go/bin/go test ./sdk/api/handlers -count=1
Tested: /tmp/go1.26.1/go/bin/go vet ./sdk/api/handlers/...
Not-tested: Full repository test suite outside sdk/api/handlers packages
BotHank-309 and others added 26 commits April 20, 2026 08:19
… filter

Symptom: warmup trigger log showed
  warmup failed provider=claude error="auth_not_found: no auth available"
even when the pinned auth was clearly present.

Root cause: Scheduler called manager.Execute(providers, req, opts) with the
auth ID pinned in metadata. manager.pickNextLegacy filters candidates by
authSupportsRouteModel — i.e. the auth must be registered in the model
registry for the requested model. Operators who restrict their Claude auth
to a custom model list (e.g. only Sonnet variants) had no entry for the
warmup recipe's claude-haiku-4-5, so the candidate list was empty.

Fix: Warmup fetches the provider executor via Manager.Executor(provider)
and calls executor.Execute(ctx, auth, req, opts) directly. This is correct
because warmup always targets a specific OAuth auth with a known-safe
minimal body; we don't need selector/quota/registry filtering. For Claude
OAuth specifically, any Claude model is callable at the Anthropic API
regardless of local registry settings.

Interface change: scheduler.Executor now requires Executor(provider) rather
than Execute(providers, req, opts). Tests updated to provide a minimal
ProviderExecutor stub.
Rationale: the degenerate max_tokens=1 + "." payload successfully reaches
the provider API, but we do not have Anthropic-side confirmation that it
reliably opens the 5-hour Claude Max session window. Some session-window
systems only start counting once a non-trivial completion has actually
been generated.

Beefier payload:
  - content "ping" instead of "." — reads as normal greeting traffic
  - max_tokens / max_output_tokens / maxOutputTokens = 16 — gives the model
    room to actually produce a reply rather than an immediate stop
  - cost impact on Haiku / Flash-Lite tiers is negligible (sub-cent per
    warmup round), well below the benefit of a deterministic window open
…aude

Claude Code stores thinking blocks returned from non-Claude providers
(Kimi, OpenAI-compatible) that the response translator emits without
signatures. When the user switches back to an Anthropic model, those
unsigned blocks are replayed to the upstream API, which rejects the
request with "Invalid \`signature\` in \`thinking\` block".

Lift SanitizeAmpRequestBody's core logic into internal/thinking as
SanitizeMessagesThinking, have amp delegate to it, and invoke it in
ClaudeExecutor.Execute / ExecuteStream right after thinking.ApplyThinking
so every path targeting Anthropic gets the cleanup.
Adds two httptest-based cases on ClaudeExecutor.Execute that capture the
actual upstream body:
- StripsUnsignedThinkingBlocks: unsigned thinking block (Kimi shape) is
  removed before reaching Anthropic, surrounding turns are preserved.
- PreservesSignedThinkingBlocks: a properly-signed thinking block is
  forwarded unchanged so multi-turn extended thinking keeps working.

Together with the function-level tests in internal/thinking, this
guards against both regressions: losing the sanitize call site, and
over-stripping legitimate signed thinking.
Single-file YAML spec describing the shadow→confirm→primary rollout,
startup-param snapshotting, rehydration, and automated health/interface
checks. Consumable by an agent (Claude-level): all helper logic lives
in anchored bash blocks in the YAML itself, no /bin/ scripts. Used
today to ship 77c2992 onto primary/shadow.

Also add `!docs/deployment.yaml` to .gitignore so future edits are
tracked (the rest of docs/ remains ignored as before).
- Refactored `/healthz` handler to support `HEAD` requests alongside `GET`.
- Updated tests to include validation for `HEAD` requests with expected status and empty body.

Closes: router-for-me#2929
…m-output-backfill

fix(codex): backfill streaming response output
…t-host-header

fix(util): forward custom Host header to upstream
…Anthropic

The codex→claude response translator writes Codex's `encrypted_content`
(Fernet tokens, always prefixed with "gAAAAAB") into Claude thinking
block `signature`. When a client replays that history back through the
proxy targeting an Anthropic model, Anthropic rejects the request with
"Invalid `signature` in `thinking` block" because Fernet is not an
Anthropic signature format.

Extend SanitizeMessagesThinking to drop any thinking block whose
signature begins with the Fernet version marker. Covers the GPT→Sonnet
failover path end-to-end (proxy forwards to Codex → returns Fernet sig
→ stored in client history → next Claude turn is sanitized before
reaching Anthropic). Genuine Anthropic signatures, which never start
with "gAAAAAB", pass through untouched so multi-turn extended thinking
keeps working.
Two changes targeting overnight OAuth expiration:

1. ClaudeExecutor.Refresh now uses RefreshTokensWithRetry(ctx, token, 3)
   instead of single-shot RefreshTokens, matching the Codex executor. A
   transient Anthropic OAuth 5xx / network blip no longer sinks the auth
   straight into the 5-minute refreshFailureBackoff loop, which could
   accumulate misses and let the 4h refresh window slip past the 8h
   access_token lifetime.

2. Conductor's refresh outcomes move from log.Debugf to structured Info
   (success / canceled) and Warn (failure) records with provider and
   auth_id fields. Previously every refresh happened silently at default
   log level, making "why did my OAuth expire overnight" unanswerable
   without flipping the entire service to debug.
- Added `GPT-Image-2` as a built-in model to avoid dependency on remote updates for Codex.
- Updated model tier functions (`CodexFree`, `CodexTeam`, etc.) to include built-in models via `WithCodexBuiltins`.
- Introduced new handlers for image generation and edit operations under `OpenAIAPIHandler`.
- Extended tests to validate 503 response for unsupported image model requests.
Both persist() and the refreshAuth success path previously swallowed
store.Save failures without any warning. This made it impossible to
distinguish a successful token refresh from one where the rotated
token was never written to disk.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ment_path

Add curl --max-time so a hung primary can't stretch the stability window
into a minutes-long block. Skip the management_path probe when the field
is empty so deployments without a management panel don't fail every
iteration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot changed the base branch from main to dev April 27, 2026 04:35
@github-actions
Copy link
Copy Markdown

This pull request targeted main.

The base branch has been automatically changed to dev.

@HeimaoLST HeimaoLST merged commit 59a737c into dev Apr 27, 2026
4 checks passed
@HeimaoLST HeimaoLST deleted the shadow-cleanup-oauth-isolation branch April 27, 2026 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.