release: v0.6.0 — RAG singleton, retry/backoff, parallel polish, on-disk polish cache#7
Merged
Conversation
…isk polish cache A four-pronged perf and resilience pass on the polish/RAG path, plus a new "cache" CLI subcommand for the on-disk cache it introduces. rag_hook: process-level RagPipeline singleton --------------------------------------------- RagPipeline construction loads the corpus, which is heavy enough that doing it once per template kind (15+ times in --all-kinds runs) was visibly slow. _get_pipeline() now caches the pipeline behind a threading.Lock with double-checked locking, so cost is paid once per process. tests/conftest.py resets the singleton between tests so existing patches still intercept construction. doc_gen/_anthropic: retry with exponential backoff -------------------------------------------------- call_anthropic now distinguishes retryable (429, 529, APIConnectionError) from non-retryable SDK errors and retries the former up to 3 times with 1s/2s/4s backoff. Non-retryable errors raise immediately. Credential redaction and __cause__ stripping are preserved. generator: parallel polish -------------------------- generate_feature_templates is now a three-phase pipeline — render (sequential, fast), polish (concurrent via ThreadPoolExecutor, max 4 workers), write (sequential, ordered). Saturates LLM-bound wall time for --all-kinds runs while staying under Anthropic rate limits. polish: on-disk cache with mtime TTL prune + clear --------------------------------------------------- polish_template now consults a sha256-keyed on-disk cache before calling the LLM. Key includes content + source_summary + template_type + system_prompt + augmented_context + model so any input change invalidates the entry. Default location is ~/.attune/polish_cache/ (overridable via env). _cache_get bumps mtime on hit so the prune sweeper treats hot entries as hot even on noatime mounts. _cache_prune deletes entries older than the TTL (default 30d, env-tunable, 0 disables) and runs lazily piggybacked on _cache_put. clear_cache() is exposed for manual nukes; the new "attune-author cache clear" subcommand calls it. Tests ----- - tests/test_polish_cache.py (new, 12 tests): hit/miss, mtime bump on hit, model in key, prune by mtime, TTL=0 disables, invalid TTL falls back, clear_cache, polish_template skips LLM on cache hit. - tests/test_anthropic_retry.py (new, 9 tests): retries on 429 / 529 / APIConnectionError, exponential schedule, gives up after _MAX_RETRIES, non-retryable raises immediately, credential redaction, __cause__ stripped. Full suite: 518 passed, 37 skipped (was 497, +21 new tests). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four perf and resilience improvements on the polish/RAG path, plus a new
cacheCLI subcommand for the on-disk cache that gets introduced.rag_hook: process-levelRagPipelinesingleton (thread-safe viathreading.Lock+ double-checked locking). Corpus loads once per process instead of once per template kind — measurable win on--all-kindsruns.doc_gen/_anthropic: retry with 1s / 2s / 4s exponential backoff for 429, 529, andAPIConnectionError. Non-retryable SDK errors raise immediately. Credential redaction and__cause__stripping preserved.generator: three-phase render → polish → write. Polish is nowThreadPoolExecutor-parallel (max 4 workers) so wall-clock time drops to roughly the slowest single LLM call instead of the sum.polish: on-disk cache at~/.attune/polish_cache/(overridable via env). Key includes content + source_summary + template_type + system_prompt + augmented_context + model, so any input or model change invalidates entries._cache_getbumps mtime on hit so heat is observed reliably even onnoatimemounts. Lazy mtime-based prune (default TTL 30d, env-tunable,0disables) piggybacked on_cache_put. Manual nuke viaattune-author cache clear.Why
Re-runs of
attune-author regenerate --all-kindswere paying full LLM cost even on unchanged source. The cache, plus parallel polish, plus prompt caching downstream in attune-rag#3, bring the bill and the wall-clock down by an order of magnitude on the warm path. The retry/backoff and singleton are robustness wins independent of cost.New CLI surface
New env vars
ATTUNE_AUTHOR_POLISH_CACHE~/.attune/polish_cacheATTUNE_AUTHOR_POLISH_CACHE_TTL_SECONDS2592000(30d)0disables pruneVersion
0.5.1→0.6.0(minor — additive but new behavior + new subcommand).Dependencies
>=0.1.0,<0.2pin satisfies 0.1.10).>=0.10.0pin satisfies 0.10.0).Test plan
tests/test_polish_cache.py— new, 12 tests: hit/miss, mtime bump on hit, model in key, prune by mtime,TTL=0disables, invalid TTL falls back to default,clear_cache, end-to-endpolish_templatecache hit skips LLMtests/test_anthropic_retry.py— new, 9 tests: 429/529/APIConnectionErrorretries, exponential schedule, gives up after_MAX_RETRIES, non-retryable raises immediately, credential redaction,__cause__strippedtests/conftest.py— autouse fixture resets RagPipeline singleton between testsattune-author cache clearagainst the live cache (deleted 16 entries)_cache_putagainst a 1-second TTL — hot entries survive, expired entries die🤖 Generated with Claude Code