Skip to content

feat(rag): resolve V2 + V3 verification gates for native citations#12

Merged
silversurfer562 merged 1 commit into
mainfrom
feat/native-citations-v2v3
May 8, 2026
Merged

feat(rag): resolve V2 + V3 verification gates for native citations#12
silversurfer562 merged 1 commit into
mainfrom
feat/native-citations-v2v3

Conversation

@silversurfer562
Copy link
Copy Markdown
Member

Summary

Resolves the two verification gates left open from #11 (native citations API). Both gates ran live against the Anthropic API on 2026-05-08 and produced clean PASS results — implementing the findings here.

V2 — cache_control on document blocks: ✅ PASS

Two-call probe with an identical 3799-token document payload:

Metric Call 1 (priming) Call 2 (cached)
cache_creation_input_tokens 3799 0
cache_read_input_tokens 0 3799
Wall-clock latency 3102 ms 2190 ms (-29%)

Document-block caching behaves identically to text-block caching. Action: cache_control: {"type": "ephemeral"} is now attached to the first document by default in ClaudeProvider._build_documents_payload. One marker on the first document covers the whole document prefix.

V3 — document-count ceiling: ✅ PASS at 200+

Probe walked n ∈ {5, 10, 20, 30, 50, 75, 100, 150, 200}. Every count was accepted without rejection. Anthropic's actual cap is higher still. Action: MAX_CITATION_DOCUMENTS raised from 20 → 200. Generous headroom for any plausible attune-rag retrieval (k=3 default, occasional bumps to k=20–50) while still surfacing a clean ValueError if a caller accidentally tries hundreds.

What landed

  • ClaudeProvider._build_documents_payloadcache_control on first doc, plain on subsequent.
  • MAX_CITATION_DOCUMENTS = 200 (was 20).
  • 2 new unit tests asserting cache_control attachment behavior (first-doc only; single-doc still flagged).
  • docs/rag/native-citations.md — "Open verification gates" → "Verification gates — resolved 2026-05-08" with findings tables inline.
  • scripts/probe_v2_cache_control.py + scripts/probe_v3_doc_count_ceiling.py shipped for re-verification against future SDK / service changes.
  • CHANGELOG entry under [0.1.14] - 2026-05-08.

Tests

  • Full suite: 350 passed, 3 xpassed. No regressions.
  • Ruff clean on touched files.

Cost

V2 + V3 probes ran for ~$0.02 total against the live API.

Test plan

  • CI green on all matrix rows (Linux/macOS/Windows × Python 3.10–3.13)

🤖 Generated with Claude Code

Both gates from the 0.1.13 PR ran live against the Anthropic API:

V2 — cache_control on document blocks: PASS
- 3799-token document payload, two identical calls.
- Call 1: cache_creation_input_tokens=3799, cache_read=0.
- Call 2: cache_creation=0, cache_read_input_tokens=3799.
- Latency: 3102 ms → 2190 ms (-29%).
- ACTION: cache_control: ephemeral now attached to the first
  document by default in _build_documents_payload. One marker
  covers the whole document prefix per Anthropic's caching
  semantics.

V3 — per-request document-count ceiling: PASS at 200+
- Probe walked n ∈ {5, 10, 20, 30, 50, 75, 100, 150, 200} and
  every count was accepted without rejection.
- ACTION: MAX_CITATION_DOCUMENTS raised 20 → 200. Conservative
  cap with headroom; the real Anthropic cap is higher.

Other:
- Probes shipped at scripts/probe_v2_cache_control.py and
  scripts/probe_v3_doc_count_ceiling.py for re-verification
  against future SDK / service changes.
- docs/rag/native-citations.md "Open verification gates"
  section is now "Verification gates — resolved 2026-05-08"
  with findings inline.
- 2 new unit tests assert cache_control attachment behavior
  (first-doc only, single-doc still flagged).
- Full suite: 350 passed, 3 xpassed; ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@silversurfer562 silversurfer562 merged commit 98c3e0f into main May 8, 2026
12 checks passed
@silversurfer562 silversurfer562 deleted the feat/native-citations-v2v3 branch May 8, 2026 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant