feat(rag): resolve V2 + V3 verification gates for native citations#12
Merged
Conversation
Both gates from the 0.1.13 PR ran live against the Anthropic API:
V2 — cache_control on document blocks: PASS
- 3799-token document payload, two identical calls.
- Call 1: cache_creation_input_tokens=3799, cache_read=0.
- Call 2: cache_creation=0, cache_read_input_tokens=3799.
- Latency: 3102 ms → 2190 ms (-29%).
- ACTION: cache_control: ephemeral now attached to the first
document by default in _build_documents_payload. One marker
covers the whole document prefix per Anthropic's caching
semantics.
V3 — per-request document-count ceiling: PASS at 200+
- Probe walked n ∈ {5, 10, 20, 30, 50, 75, 100, 150, 200} and
every count was accepted without rejection.
- ACTION: MAX_CITATION_DOCUMENTS raised 20 → 200. Conservative
cap with headroom; the real Anthropic cap is higher.
Other:
- Probes shipped at scripts/probe_v2_cache_control.py and
scripts/probe_v3_doc_count_ceiling.py for re-verification
against future SDK / service changes.
- docs/rag/native-citations.md "Open verification gates"
section is now "Verification gates — resolved 2026-05-08"
with findings inline.
- 2 new unit tests assert cache_control attachment behavior
(first-doc only, single-doc still flagged).
- Full suite: 350 passed, 3 xpassed; ruff clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Resolves the two verification gates left open from #11 (native citations API). Both gates ran live against the Anthropic API on 2026-05-08 and produced clean PASS results — implementing the findings here.
V2 —
cache_controlon document blocks: ✅ PASSTwo-call probe with an identical 3799-token document payload:
cache_creation_input_tokenscache_read_input_tokensDocument-block caching behaves identically to text-block caching. Action:
cache_control: {"type": "ephemeral"}is now attached to the first document by default inClaudeProvider._build_documents_payload. One marker on the first document covers the whole document prefix.V3 — document-count ceiling: ✅ PASS at 200+
Probe walked
n ∈ {5, 10, 20, 30, 50, 75, 100, 150, 200}. Every count was accepted without rejection. Anthropic's actual cap is higher still. Action:MAX_CITATION_DOCUMENTSraised from 20 → 200. Generous headroom for any plausible attune-rag retrieval (k=3default, occasional bumps tok=20–50) while still surfacing a cleanValueErrorif a caller accidentally tries hundreds.What landed
ClaudeProvider._build_documents_payload—cache_controlon first doc, plain on subsequent.MAX_CITATION_DOCUMENTS = 200(was 20).docs/rag/native-citations.md— "Open verification gates" → "Verification gates — resolved 2026-05-08" with findings tables inline.scripts/probe_v2_cache_control.py+scripts/probe_v3_doc_count_ceiling.pyshipped for re-verification against future SDK / service changes.[0.1.14] - 2026-05-08.Tests
Cost
V2 + V3 probes ran for ~$0.02 total against the live API.
Test plan
🤖 Generated with Claude Code