Conversation
When testing claude code against llama.cpp, I noticed that only n_past 18577 was used even when context was 60k or more. The log in llama-server says: ``` slot update_slots: id 3 | task 10342 | old: ... ; cch= | defa0;You are slot update_slots: id 3 | task 10342 | new: ... ; cch= | 1c8b4; ``` I observed that the cch value changed every time. Reading about that, the x-anthropic-billing-header system message seems to be specially handled inside of the anthropic api. I could remove it, but there is a meaningful string sometimes included at the end. So instead, I just replace the changing cch checksum with fffff. It's always 5 hexadecimal characters, but I've written the replacement defensively in case they change the protocol.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughA normalization function was added to detect and process Anthropic billing headers within system prompts, specifically locating and modifying Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/server/server-common.cpp`:
- Around line 1464-1465: The LOG_ERR call currently prints the full system_text
on anthropic parse failures; update the error logging in server-common.cpp to
avoid leaking prompt content by redacting system_text before logging: e.g.,
replace the direct "%s" use in the LOG_ERR invocation with a short diagnostic
such as a fixed message plus a truncated snippet (first N chars) or a hash/hex
digest of system_text, and indicate its length (e.g., "anthropic string not as
expected (len=..., prefix='...')") so that LOG_ERR, system_text, and the
surrounding parse-failure branch no longer emit the full prompt contents.
- Around line 1456-1464: The code currently uses system_text.rfind("cch=") which
can hit a later `cch=` inside the preserved suffix; change the lookup to locate
the checksum field inside the header region (e.g., use system_text.find("cch=")
starting from the header start boundary or parse the header up to the
preserved-suffix delimiter) so you target the header checksum instead of the
last occurrence; keep the same validation logic (compute index_replace =
index_cch + 4, check index_replace + cch_length < system_text.length() and that
the following char is ';', then overwrite the cch_length chars with 'f') but
ensure index_cch is found within the header range rather than using rfind.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b521ed66-afbe-4df2-893a-6b0f8ae6ca33
📒 Files selected for processing (1)
tools/server/server-common.cpp
tools/server/server-common.cpp
Outdated
| } else { | ||
| LOG_ERR("anthropic string not as expected: %s", system_text.c_str()); |
There was a problem hiding this comment.
Avoid logging the full billing-header block on parse failures.
This error path dumps the entire system_text at error level. That block can include preserved trailing system prompt text, so a format drift would leak prompt content into normal server logs. Please log a redacted diagnostic instead. (github.com)
Suggested fix
- LOG_ERR("anthropic string not as expected: %s", system_text.c_str());
+ LOG_ERR("unexpected x-anthropic-billing-header format while normalizing cch");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/server/server-common.cpp` around lines 1464 - 1465, The LOG_ERR call
currently prints the full system_text on anthropic parse failures; update the
error logging in server-common.cpp to avoid leaking prompt content by redacting
system_text before logging: e.g., replace the direct "%s" use in the LOG_ERR
invocation with a short diagnostic such as a fixed message plus a truncated
snippet (first N chars) or a hash/hex digest of system_text, and indicate its
length (e.g., "anthropic string not as expected (len=..., prefix='...')") so
that LOG_ERR, system_text, and the surrounding parse-failure branch no longer
emit the full prompt contents.
There was a problem hiding this comment.
Despite its name, this is not a secrets-bearing system message. Here is an example:
x-anthropic-billing-header: cc_version=2.1.101.e51; cc_entrypoint=cli; cch=a5145;You are Claude Code, Anthropic's official CLI for Claude.
Auth is passed via an HTTP header; this is a "protocol" issue in a system prompt with an unfortunately named prefix. If a human with judgement and ownership of this project wishes me to redact this, I will do so happily! But I do not agree with this point of autobot feedback.
There was a problem hiding this comment.
@kvc0, thank you for the clarification and the concrete example. You're right — looking at the actual format:
x-anthropic-billing-header: cc_version=2.1.101.e51; cc_entrypoint=cli; cch=a5145;You are Claude Code, Anthropic's official CLI for Claude.
This is clearly a protocol annotation with versioning/entrypoint metadata and the system prompt text itself, not an auth-bearing field (those travel in HTTP headers). My concern was based on the misleading name and was overstated. The current LOG_ERR call is fine as-is, and I'm withdrawing the suggestion.
Noting for @ngxson: the upstream contributor (kvc0) has explained the nature of this header and disagrees with the redaction suggestion — the existing logging is acceptable.
✏️ Learnings added
Learnt from: kvc0
Repo: ngxson/llama.cpp PR: 98
File: tools/server/server-common.cpp:1464-1465
Timestamp: 2026-04-12T21:12:14.187Z
Learning: In `tools/server/server-common.cpp`, the `x-anthropic-billing-header` system block (used in `convert_anthropic_to_oai`) is not an auth/secrets-bearing message. It is a protocol annotation injected by the Claude CLI into the system prompt, with the format: `x-anthropic-billing-header: cc_version=<ver>; cc_entrypoint=<ep>; cch=<5hex>;` followed by the actual system prompt text. Auth credentials are passed via HTTP headers separately. Logging the full string on format errors is acceptable and does not leak secrets.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: ngxson
Repo: ngxson/llama.cpp PR: 0
File: :0-0
Timestamp: 2025-12-22T23:32:42.603Z
Learning: For mirrored PRs (those with [Mirror] in title or mirroring upstream), ngxson wants AI-assisted code review with these preferences: check the upstream PR URL for description, be nit-picky about obvious mistakes, don't tag contributors directly (inform ngxson instead), and alert about suspicious AI-generated code comments like "CRITICAL: ..." or "You are correct here...".
When reviewing mirrored PRs, the PR is from an upstream contributor, not from ngxson himself. Any issues should be reported to ngxson so he can relay them to the contributor.
There was a problem hiding this comment.
IMO it may worth adding an example to the code comment, or even better, leave a link to the upstream PR in the code
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
I'm more than happy to do both - and it's updated ❤️
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tools/server/server-common.cpp (1)
1445-1447:⚠️ Potential issue | 🟠 MajorNormalize the plain-string
systempath too.Line 1445 still copies
systemverbatim. If the billing header arrives as a plain string instead of an array text block, thecch=stamp survives there and the prefix-cache miss remains for that equivalent input shape. A small helper keeps both branches aligned.♻️ Proposed change
+ auto normalize_billing_header = [](std::string & system_text) { + if (system_text.rfind("x-anthropic-billing-header:", 0) != 0) { + return; + } + + const size_t header_prefix_length = strlen("x-anthropic-billing-header:"); + const size_t cch_length = 5; + const size_t index_cch = system_text.find("cch=", header_prefix_length); + + if (index_cch != std::string::npos) { + const size_t index_replace = index_cch + 4; + if (index_replace + cch_length < system_text.length() && + system_text[index_replace + cch_length] == ';') { + for (size_t i = 0; i < cch_length; ++i) { + system_text[index_replace + i] = 'f'; + } + return; + } + } + + LOG_ERR("anthropic string not as expected: %s", system_text.c_str()); + }; + if (system_param.is_string()) { system_content = system_param.get<std::string>(); + normalize_billing_header(system_content); } else if (system_param.is_array()) { for (const auto & block : system_param) { if (json_value(block, "type", std::string()) == "text") { auto system_text = json_value(block, "text", std::string()); - if (system_text.rfind("x-anthropic-billing-header:", 0) == 0 ) { - // ... - } + normalize_billing_header(system_text); system_content += system_text; } } }Also applies to: 1450-1473
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/server/server-common.cpp` around lines 1445 - 1447, The plain-string branch currently assigns system_content directly from system_param, leaving any leading "cch=" stamp intact; add a small helper (e.g., normalize_system_content or normalize_billing_header) that strips/normalizes the "cch=" prefix and apply it in both branches (the system_param.is_string() assignment and the system_param.is_array() path where system_content is constructed) so both code paths produce the same normalized system_content and avoid prefix-cache misses.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tools/server/server-common.cpp`:
- Around line 1445-1447: The plain-string branch currently assigns
system_content directly from system_param, leaving any leading "cch=" stamp
intact; add a small helper (e.g., normalize_system_content or
normalize_billing_header) that strips/normalizes the "cch=" prefix and apply it
in both branches (the system_param.is_string() assignment and the
system_param.is_array() path where system_content is constructed) so both code
paths produce the same normalized system_content and avoid prefix-cache misses.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 15985098-b475-441f-852d-17ec6b93acf5
📒 Files selected for processing (1)
tools/server/server-common.cpp
|
@ngxson I addressed this last commend coderabbit brought up. I factored out the new code into a helper function, and called it in the 2 places like it suggested. I did it as a plain function though instead of a closure, because that seems more readable to me. |
Mirror upstream PR ggml-org#21793
@coderabbitai review with mirror preset
Summary by CodeRabbit