perf(retrieval): memoize per-entry tokens on the entry itself by silversurfer562 · Pull Request #9 · Smart-AI-Memory/attune-rag

silversurfer562 · 2026-05-08T01:22:02Z

Code-review finding (retrieval.py:179). The keyword retriever re-tokenized every entry's path/summary/content/aliases on every query, plus looped over related entries' summaries. N×Q tokenizations for N entries × Q queries.

Add a _tokens_cache sidecar on RetrievalEntry (frozen + non-comparing/hashing/repr-ing) and route the scorer through two memoizing helpers. Cache is keyed by CONTENT_PREVIEW_CHARS and corpus.name so subclasses and cross-corpus lookups don't collide.

Three new tests. 306 passed (was 303).

🤖 Generated with Claude Code

Code-review 2026-05-07: the keyword retriever's ``_score_entry`` re-tokenized path / summary / content-preview / aliases — and looped over related entries' summaries — on EVERY query against EVERY entry. For a corpus of N entries answering Q queries, that's N*Q tokenization passes, all redoing identical work. Add a ``_tokens_cache`` field on :class:`RetrievalEntry` (frozen dataclass; ``compare=False, hash=False, repr=False`` so identity semantics are unchanged) and route the retriever through two helpers: - ``_entry_field_tokens`` — keyed by ``("field_tokens", CONTENT_PREVIEW_CHARS)``. Computes path/summary/content-preview/aliases once per entry per retriever-class. Subclasses with different preview sizes get independent cache slots automatically. - ``_related_summary_tokens`` — keyed by ``("related_tokens", corpus.name)``. Computes the union of related-entry summary tokens once per (entry, corpus) pair. Cache lives on the entry, so when a corpus rebuilds (new entry instances), it's naturally fresh. Three new tests cover: cache populated on first call (same dict returned on subsequent calls), preview-size-keyed independence, and the bottom line — repeated ``_score_entry`` calls against the same entry don't re-tokenize the entry's fields. 306 passed (up from 303). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

silversurfer562 merged commit 1eaa641 into main May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(retrieval): memoize per-entry tokens on the entry itself#9

perf(retrieval): memoize per-entry tokens on the entry itself#9
silversurfer562 merged 1 commit into
mainfrom
perf/precompute-entry-tokens

silversurfer562 commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

silversurfer562 commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant