Skip to content

release: v0.6.0 — RAG singleton, retry/backoff, parallel polish, on-disk polish cache#7

Merged
silversurfer562 merged 1 commit into
mainfrom
release/0.6.0-polish-cache-and-perf
May 1, 2026
Merged

release: v0.6.0 — RAG singleton, retry/backoff, parallel polish, on-disk polish cache#7
silversurfer562 merged 1 commit into
mainfrom
release/0.6.0-polish-cache-and-perf

Conversation

@silversurfer562
Copy link
Copy Markdown
Member

Summary

Four perf and resilience improvements on the polish/RAG path, plus a new cache CLI subcommand for the on-disk cache that gets introduced.

  • rag_hook: process-level RagPipeline singleton (thread-safe via threading.Lock + double-checked locking). Corpus loads once per process instead of once per template kind — measurable win on --all-kinds runs.
  • doc_gen/_anthropic: retry with 1s / 2s / 4s exponential backoff for 429, 529, and APIConnectionError. Non-retryable SDK errors raise immediately. Credential redaction and __cause__ stripping preserved.
  • generator: three-phase render → polish → write. Polish is now ThreadPoolExecutor-parallel (max 4 workers) so wall-clock time drops to roughly the slowest single LLM call instead of the sum.
  • polish: on-disk cache at ~/.attune/polish_cache/ (overridable via env). Key includes content + source_summary + template_type + system_prompt + augmented_context + model, so any input or model change invalidates entries. _cache_get bumps mtime on hit so heat is observed reliably even on noatime mounts. Lazy mtime-based prune (default TTL 30d, env-tunable, 0 disables) piggybacked on _cache_put. Manual nuke via attune-author cache clear.

Why

Re-runs of attune-author regenerate --all-kinds were paying full LLM cost even on unchanged source. The cache, plus parallel polish, plus prompt caching downstream in attune-rag#3, bring the bill and the wall-clock down by an order of magnitude on the warm path. The retry/backoff and singleton are robustness wins independent of cost.

New CLI surface

attune-author cache clear    # delete every cached polish entry

New env vars

Var Default Purpose
ATTUNE_AUTHOR_POLISH_CACHE ~/.attune/polish_cache Cache directory override
ATTUNE_AUTHOR_POLISH_CACHE_TTL_SECONDS 2592000 (30d) Mtime TTL; 0 disables prune

Version

0.5.10.6.0 (minor — additive but new behavior + new subcommand).

Dependencies

  • Allows but doesn't require attune-rag#3 (>=0.1.0,<0.2 pin satisfies 0.1.10).
  • Allows but doesn't require attune-help#4 (>=0.10.0 pin satisfies 0.10.0).

Test plan

  • tests/test_polish_cache.py — new, 12 tests: hit/miss, mtime bump on hit, model in key, prune by mtime, TTL=0 disables, invalid TTL falls back to default, clear_cache, end-to-end polish_template cache hit skips LLM
  • tests/test_anthropic_retry.py — new, 9 tests: 429/529/APIConnectionError retries, exponential schedule, gives up after _MAX_RETRIES, non-retryable raises immediately, credential redaction, __cause__ stripped
  • tests/conftest.py — autouse fixture resets RagPipeline singleton between tests
  • Full suite: 518 passed, 37 skipped (was 497, +21 new tests)
  • Smoke-tested attune-author cache clear against the live cache (deleted 16 entries)
  • Smoke-tested TTL prune via _cache_put against a 1-second TTL — hot entries survive, expired entries die

🤖 Generated with Claude Code

…isk polish cache

A four-pronged perf and resilience pass on the polish/RAG path,
plus a new "cache" CLI subcommand for the on-disk cache it
introduces.

rag_hook: process-level RagPipeline singleton
---------------------------------------------
RagPipeline construction loads the corpus, which is heavy enough
that doing it once per template kind (15+ times in --all-kinds
runs) was visibly slow. _get_pipeline() now caches the pipeline
behind a threading.Lock with double-checked locking, so cost is
paid once per process. tests/conftest.py resets the singleton
between tests so existing patches still intercept construction.

doc_gen/_anthropic: retry with exponential backoff
--------------------------------------------------
call_anthropic now distinguishes retryable (429, 529,
APIConnectionError) from non-retryable SDK errors and retries
the former up to 3 times with 1s/2s/4s backoff. Non-retryable
errors raise immediately. Credential redaction and __cause__
stripping are preserved.

generator: parallel polish
--------------------------
generate_feature_templates is now a three-phase pipeline —
render (sequential, fast), polish (concurrent via
ThreadPoolExecutor, max 4 workers), write (sequential, ordered).
Saturates LLM-bound wall time for --all-kinds runs while staying
under Anthropic rate limits.

polish: on-disk cache with mtime TTL prune + clear
---------------------------------------------------
polish_template now consults a sha256-keyed on-disk cache before
calling the LLM. Key includes content + source_summary +
template_type + system_prompt + augmented_context + model so any
input change invalidates the entry. Default location is
~/.attune/polish_cache/ (overridable via env). _cache_get bumps
mtime on hit so the prune sweeper treats hot entries as hot
even on noatime mounts. _cache_prune deletes entries older than
the TTL (default 30d, env-tunable, 0 disables) and runs lazily
piggybacked on _cache_put. clear_cache() is exposed for manual
nukes; the new "attune-author cache clear" subcommand calls it.

Tests
-----
- tests/test_polish_cache.py (new, 12 tests): hit/miss, mtime
  bump on hit, model in key, prune by mtime, TTL=0 disables,
  invalid TTL falls back, clear_cache, polish_template skips
  LLM on cache hit.
- tests/test_anthropic_retry.py (new, 9 tests): retries on 429
  / 529 / APIConnectionError, exponential schedule, gives up
  after _MAX_RETRIES, non-retryable raises immediately,
  credential redaction, __cause__ stripped.

Full suite: 518 passed, 37 skipped (was 497, +21 new tests).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@silversurfer562 silversurfer562 merged commit f32efba into main May 1, 2026
12 checks passed
@silversurfer562 silversurfer562 deleted the release/0.6.0-polish-cache-and-perf branch May 1, 2026 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant