[Mirror] anthropic: fix prefix caching by ngxson · Pull Request #98 · ngxson/llama.cpp

ngxson · 2026-04-12T15:31:26Z

@coderabbitai review with mirror preset

Summary by CodeRabbit

Bug Fixes
- Improved reliability of billing header handling in system prompts during API request conversions.

When testing claude code against llama.cpp, I noticed that only n_past 18577 was used even when context was 60k or more. The log in llama-server says: ``` slot update_slots: id 3 | task 10342 | old: ... ; cch= | defa0;You are slot update_slots: id 3 | task 10342 | new: ... ; cch= | 1c8b4; ``` I observed that the cch value changed every time. Reading about that, the x-anthropic-billing-header system message seems to be specially handled inside of the anthropic api. I could remove it, but there is a meaningful string sometimes included at the end. So instead, I just replace the changing cch checksum with fffff. It's always 5 hexadecimal characters, but I've written the replacement defensively in case they change the protocol.

coderabbitai · 2026-04-12T15:31:49Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9311f454-870d-43ed-bb64-b3e10ca5ade2

📥 Commits

Reviewing files that changed from the base of the PR and between 3a993a7 and f3ec37c.

📒 Files selected for processing (1)

tools/server/server-common.cpp

📝 Walkthrough

Walkthrough

A normalization function was added to detect and process Anthropic billing headers within system prompts, specifically locating and modifying cch fields by replacing their values with 'f'. The convert_anthropic_to_oai function was updated to apply this normalization to system content during Anthropic-to-OpenAI conversion, handling both string and array inputs.

Changes

Cohort / File(s)	Summary
Anthropic Billing Header Normalization `tools/server/server-common.cpp`	Added `normalize_anthropic_billing_header()` function to detect and modify `x-anthropic-billing-header:` prompts by replacing CCH stamp values with `'f'`. Updated `convert_anthropic_to_oai()` to apply normalization to system content during conversion, with separate handling for string and array inputs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A billing header hops into view,
With cch= stamps to normalize true,
Five characters stamped with an 'f' so bright,
From Anthropic prompts to OpenAI's light!
Hopping through systems with billing care divine. ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is minimal and lacks required sections from the template. It only mentions it mirrors an upstream PR, without providing an overview, implementation details, or completing required checklist items.	Add an Overview section describing the prefix caching fix, mention AI usage disclosure, and confirm reading contributing guidelines per the template requirements.
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: fixing prefix caching in the Anthropic integration.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tools/server/server-common.cpp`:
- Around line 1464-1465: The LOG_ERR call currently prints the full system_text
on anthropic parse failures; update the error logging in server-common.cpp to
avoid leaking prompt content by redacting system_text before logging: e.g.,
replace the direct "%s" use in the LOG_ERR invocation with a short diagnostic
such as a fixed message plus a truncated snippet (first N chars) or a hash/hex
digest of system_text, and indicate its length (e.g., "anthropic string not as
expected (len=..., prefix='...')") so that LOG_ERR, system_text, and the
surrounding parse-failure branch no longer emit the full prompt contents.
- Around line 1456-1464: The code currently uses system_text.rfind("cch=") which
can hit a later `cch=` inside the preserved suffix; change the lookup to locate
the checksum field inside the header region (e.g., use system_text.find("cch=")
starting from the header start boundary or parse the header up to the
preserved-suffix delimiter) so you target the header checksum instead of the
last occurrence; keep the same validation logic (compute index_replace =
index_cch + 4, check index_replace + cch_length < system_text.length() and that
the following char is ';', then overwrite the cch_length chars with 'f') but
ensure index_cch is found within the header range rather than using rfind.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b521ed66-afbe-4df2-893a-6b0f8ae6ca33

📥 Commits

Reviewing files that changed from the base of the PR and between aa4695c and bdc7867.

📒 Files selected for processing (1)

tools/server/server-common.cpp

tools/server/server-common.cpp

coderabbitai · 2026-04-12T15:37:16Z

tools/server/server-common.cpp

+                        } else {
+                            LOG_ERR("anthropic string not as expected: %s", system_text.c_str());


⚠️ Potential issue | 🟠 Major

Avoid logging the full billing-header block on parse failures.

This error path dumps the entire system_text at error level. That block can include preserved trailing system prompt text, so a format drift would leak prompt content into normal server logs. Please log a redacted diagnostic instead. (github.com)

Suggested fix

- LOG_ERR("anthropic string not as expected: %s", system_text.c_str()); + LOG_ERR("unexpected x-anthropic-billing-header format while normalizing cch");

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tools/server/server-common.cpp` around lines 1464 - 1465, The LOG_ERR call currently prints the full system_text on anthropic parse failures; update the error logging in server-common.cpp to avoid leaking prompt content by redacting system_text before logging: e.g., replace the direct "%s" use in the LOG_ERR invocation with a short diagnostic such as a fixed message plus a truncated snippet (first N chars) or a hash/hex digest of system_text, and indicate its length (e.g., "anthropic string not as expected (len=..., prefix='...')") so that LOG_ERR, system_text, and the surrounding parse-failure branch no longer emit the full prompt contents.

Despite its name, this is not a secrets-bearing system message. Here is an example:

x-anthropic-billing-header: cc_version=2.1.101.e51; cc_entrypoint=cli; cch=a5145;You are Claude Code, Anthropic's official CLI for Claude.

Auth is passed via an HTTP header; this is a "protocol" issue in a system prompt with an unfortunately named prefix. If a human with judgement and ownership of this project wishes me to redact this, I will do so happily! But I do not agree with this point of autobot feedback.

@kvc0, thank you for the clarification and the concrete example. You're right — looking at the actual format:

x-anthropic-billing-header: cc_version=2.1.101.e51; cc_entrypoint=cli; cch=a5145;You are Claude Code, Anthropic's official CLI for Claude.

This is clearly a protocol annotation with versioning/entrypoint metadata and the system prompt text itself, not an auth-bearing field (those travel in HTTP headers). My concern was based on the misleading name and was overstated. The current LOG_ERR call is fine as-is, and I'm withdrawing the suggestion.

Noting for @ngxson: the upstream contributor (kvc0) has explained the nature of this header and disagrees with the redaction suggestion — the existing logging is acceptable.

✏️ Learnings added

Learnt from: kvc0 Repo: ngxson/llama.cpp PR: 98 File: tools/server/server-common.cpp:1464-1465 Timestamp: 2026-04-12T21:12:14.187Z Learning: In `tools/server/server-common.cpp`, the `x-anthropic-billing-header` system block (used in `convert_anthropic_to_oai`) is not an auth/secrets-bearing message. It is a protocol annotation injected by the Claude CLI into the system prompt, with the format: `x-anthropic-billing-header: cc_version=<ver>; cc_entrypoint=<ep>; cch=<5hex>;` followed by the actual system prompt text. Auth credentials are passed via HTTP headers separately. Logging the full string on format errors is acceptable and does not leak secrets.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: ngxson Repo: ngxson/llama.cpp PR: 0 File: :0-0 Timestamp: 2025-12-22T23:32:42.603Z Learning: For mirrored PRs (those with [Mirror] in title or mirroring upstream), ngxson wants AI-assisted code review with these preferences: check the upstream PR URL for description, be nit-picky about obvious mistakes, don't tag contributors directly (inform ngxson instead), and alert about suspicious AI-generated code comments like "CRITICAL: ..." or "You are correct here...". When reviewing mirrored PRs, the PR is from an upstream contributor, not from ngxson himself. Any issues should be reported to ngxson so he can relay them to the contributor.

IMO it may worth adding an example to the code comment, or even better, leave a link to the upstream PR in the code

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

I'm more than happy to do both - and it's updated ❤️

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tools/server/server-common.cpp (1)

1445-1447: ⚠️ Potential issue | 🟠 Major

Normalize the plain-string system path too.

Line 1445 still copies system verbatim. If the billing header arrives as a plain string instead of an array text block, the cch= stamp survives there and the prefix-cache miss remains for that equivalent input shape. A small helper keeps both branches aligned.

♻️ Proposed change

+        auto normalize_billing_header = [](std::string & system_text) {
+            if (system_text.rfind("x-anthropic-billing-header:", 0) != 0) {
+                return;
+            }
+
+            const size_t header_prefix_length = strlen("x-anthropic-billing-header:");
+            const size_t cch_length = 5;
+            const size_t index_cch = system_text.find("cch=", header_prefix_length);
+
+            if (index_cch != std::string::npos) {
+                const size_t index_replace = index_cch + 4;
+                if (index_replace + cch_length < system_text.length() &&
+                    system_text[index_replace + cch_length] == ';') {
+                    for (size_t i = 0; i < cch_length; ++i) {
+                        system_text[index_replace + i] = 'f';
+                    }
+                    return;
+                }
+            }
+
+            LOG_ERR("anthropic string not as expected: %s", system_text.c_str());
+        };
+
         if (system_param.is_string()) {
             system_content = system_param.get<std::string>();
+            normalize_billing_header(system_content);
         } else if (system_param.is_array()) {
             for (const auto & block : system_param) {
                 if (json_value(block, "type", std::string()) == "text") {
                     auto system_text = json_value(block, "text", std::string());
-                    if (system_text.rfind("x-anthropic-billing-header:", 0) == 0 ) {
-                        // ...
-                    }
+                    normalize_billing_header(system_text);
                     system_content += system_text;
                 }
             }
         }

Also applies to: 1450-1473

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tools/server/server-common.cpp` around lines 1445 - 1447, The plain-string
branch currently assigns system_content directly from system_param, leaving any
leading "cch=" stamp intact; add a small helper (e.g., normalize_system_content
or normalize_billing_header) that strips/normalizes the "cch=" prefix and apply
it in both branches (the system_param.is_string() assignment and the
system_param.is_array() path where system_content is constructed) so both code
paths produce the same normalized system_content and avoid prefix-cache misses.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tools/server/server-common.cpp`:
- Around line 1445-1447: The plain-string branch currently assigns
system_content directly from system_param, leaving any leading "cch=" stamp
intact; add a small helper (e.g., normalize_system_content or
normalize_billing_header) that strips/normalizes the "cch=" prefix and apply it
in both branches (the system_param.is_string() assignment and the
system_param.is_array() path where system_content is constructed) so both code
paths produce the same normalized system_content and avoid prefix-cache misses.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 15985098-b475-441f-852d-17ec6b93acf5

📥 Commits

Reviewing files that changed from the base of the PR and between a1dea2b and ecaebff.

📒 Files selected for processing (1)

tools/server/server-common.cpp

kvc0 · 2026-04-12T22:09:56Z

@ngxson I addressed this last commend coderabbit brought up. I factored out the new code into a helper function, and called it in the 2 places like it suggested. I did it as a plain function though instead of a closure, because that seems more readable to me.

github-actions bot added examples server labels Apr 12, 2026

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

ngxson mentioned this pull request Apr 12, 2026

server: (anthropic API) fix prefix caching ggml-org/llama.cpp#21793

Open

kvc0 added 2 commits April 12, 2026 14:15

review feedback

a1dea2b

more comment clarity

ecaebff

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

ngxson force-pushed the master branch from 92dd908 to 073bb2c Compare April 12, 2026 21:27

another autofeedback

3a993a7

just declare the helper locally

f3ec37c

github-actions bot added ggml Nvidia GPU labels Apr 12, 2026

		} else {
		LOG_ERR("anthropic string not as expected: %s", system_text.c_str());

Conversation

ngxson commented Apr 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kvc0 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

kvc0 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

kvc0 commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson commented Apr 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 12, 2026 •

edited

Loading

coderabbitai bot Apr 12, 2026 •

edited

Loading