feat: PIC cache reuse with mistralrs (local) backend by starpit · Pull Request #904 · IBM/spnl

starpit · 2026-02-24T00:16:56Z

Summary

Wire spnl Plus operators to mistral.rs for position-independent KV cache reuse (PIC) across requests. Plus blocks in queries are tagged with in-band markers, which the engine uses to identify cacheable segments via text-based content hashing — enabling automatic KV cache hits when the same document appears across different requests, regardless of position.

What's included

PIC integration: add_messages_from_query tags Plus blocks with \0PIC_PLUS\0 in-band markers; the mistral.rs engine strips them before chat template processing and uses token subsequence matching to resolve message-to-token boundaries for cache lookup
Benchmark (spnl bench pic): measures TTFT speedup and accuracy from cross-request PIC cache reuse with shuffled documents
- -o/--output controls output (comma-separated): speedup, iqr, hitrate, latency, json, accuracy — combinable, e.g. -o speedup,accuracy
- --full sweeps across doc sizes (xs through xxl) and models
- -l/--length sets doc words (TTFT) or max tokens (accuracy)
- --grading-model enables LLM-judge semantic equivalence scoring alongside token F1
- Progress bars with silent model loading
Tests: unit tests for PIC helpers (token_f1, normalize_tokens, parse_score, resolve_spectrum, compute_hit_rate, percentile) and add_messages_from_query_inner message builder (Plus tagging, Cross/Plus nesting, unsupported query types). Ignored integration test verifies PIC speedup > 1x with a real model.
Docs: README_PIC.md (user-facing: query syntax, CLI, results) and engine internals in mistral.rs/PIC.md
Deps: mistralrs/mistralrs-core pointed to starpit/mistral.rs fork branch pic-cache-reuse

Benchmark Results

Output of spnl bench pic --full:

Model         xs 10w  sm 50w  m 200w  lg 500w  xl 1000w  xxl 2000w
 ──────────── ─────── ─────── ─────── ──────── ───────── ──────────
 llama3.2:1b    2.02x   4.28x  13.35x   29.85x    52.91x     80.55x
 llama3.2:3b    2.12x   4.56x  15.25x   29.84x    48.71x     83.94x
 llama3.1:8b    2.38x   5.48x  17.71x   36.17x    65.57x     91.84x
 qwen2.5:0.5b   1.74x   3.38x  10.10x   22.51x    46.70x     87.87x
 qwen2.5:14b    3.00x   7.41x  24.74x   48.96x    97.98x    124.84x

Test plan

Unit tests pass: cargo test -p spnl-cli --features bench --bin spnl -- bench::pic::tests (22 tests)
Message builder tests pass: cargo test -p spnl --features local --lib -- generate::backend::mistralrs::tests (5 tests)
Clippy clean: both spnl and spnl-cli pass clippy -- -D warnings
cargo build --release -F bench,metal succeeds
spnl bench pic runs end-to-end against a local model
Integration test with GPU: cargo test -p spnl-cli --features bench,metal --bin spnl -- bench::pic::tests::pic_benchmark_shows_speedup --ignored
Accuracy mode (-o accuracy) produces Plus vs flat comparison output
Review README_PIC.md for correctness

Wire spnl Plus operators to mistral.rs for position-independent KV cache reuse (PIC) across requests. Plus blocks in queries are tagged with in-band markers (\0PIC_PLUS\0), which the engine uses to identify cacheable segments via text-based content hashing. Add `spnl bench pic` benchmark measuring TTFT speedup and accuracy from PIC cache reuse with shuffled documents. Output controlled by `-o/--output` (comma-separated): speedup, iqr, hitrate, latency, json, accuracy. Supports `--full` sweep across doc sizes and models, `-l/--length` for doc words (TTFT) or max tokens (accuracy), `--grading-model` for LLM-judge scoring, progress bars, and silent model loading. Add unit tests for PIC helper functions (token_f1, normalize_tokens, parse_score, resolve_spectrum, compute_hit_rate, percentile) and add_messages_from_query_inner message builder. Include ignored integration test that verifies PIC speedup > 1x with a real model. Add README_PIC.md (user-facing: query syntax, CLI usage, benchmark results) and split engine internals into mistral.rs/PIC.md. Point mistralrs/mistralrs-core deps to starpit/mistral.rs fork branch pic-cache-reuse for PR testing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Nick Mitchell <nickm@us.ibm.com>

- Update mistralrs dependency to point to starpit/mistral.rs pic-cache-reuse branch - Update version requirement to >=0.7.1-alpha.1 for pre-release compatibility - Comment out local patch override - Rework accuracy benchmark to use fictional factual docs with verifiable answers - Support PIC-shuffled (pshuf) mode in accuracy benchmark - Add t-shirt size aliases (sm/lg) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Nick Mitchell <nickm@us.ibm.com>

Enable the prepare_fragment optimization path which pre-populates the PIC cache via 1-token generates before the main query, allowing all Plus blocks to hit the cache on the first request. - hlo::optimize enabled in bench/pic.rs timed_request - plus() keeps Plus wrapping for single elements (preserves PIC tagging) - Monad returns Seq([]) instead of empty message (avoids phantom tokens) - Comment out local patch override (changes pushed to starpit branch) - Fix missing verbose field in test RunCtx Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Nick Mitchell <nickm@us.ibm.com>

starpit added the made with opus4.6 label Feb 24, 2026

starpit force-pushed the pic-cache-reuse branch 2 times, most recently from cf902c2 to 4197e89 Compare February 24, 2026 00:18

starpit marked this pull request as draft February 24, 2026 00:19

starpit changed the title ~~feat: PIC cache reuse~~ feat: PIC cache reuse with mistralrs (local) backend Feb 24, 2026

starpit force-pushed the pic-cache-reuse branch from c002cbd to 0a2d2e5 Compare February 24, 2026 16:09

starpit marked this pull request as ready for review February 24, 2026 16:10

starpit force-pushed the pic-cache-reuse branch 2 times, most recently from 964b567 to 93f22b9 Compare February 24, 2026 19:39

starpit force-pushed the pic-cache-reuse branch from 93f22b9 to f4b239f Compare February 24, 2026 21:27

starpit marked this pull request as draft February 25, 2026 21:02

starpit and others added 2 commits February 25, 2026 19:15

starpit force-pushed the pic-cache-reuse branch from 5ae4679 to 921df25 Compare February 27, 2026 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: PIC cache reuse with mistralrs (local) backend#904

feat: PIC cache reuse with mistralrs (local) backend#904
starpit wants to merge 3 commits intoIBM:mainfrom
starpit:pic-cache-reuse

starpit commented Feb 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

starpit commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Benchmark Results

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

starpit commented Feb 24, 2026 •

edited

Loading