Skip to content

perf: improve distillation detail retention at 400K+ token sessions#423

Merged
BYK merged 2 commits into
mainfrom
perf/distillation-detail-retention
May 20, 2026
Merged

perf: improve distillation detail retention at 400K+ token sessions#423
BYK merged 2 commits into
mainfrom
perf/distillation-detail-retention

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 20, 2026

Summary

Addresses information loss in Lore's distillation pipeline during long sessions (400K+ tokens), where two-stage compression (gen-0 + meta) produces ~53-64:1 total compression, dropping specific identifiers that compaction's single-pass preserves.

Closes #417

Changes

  • Increase gen-0 budget multiplier (8→10): Gives the observer LLM 25% more tokens per segment (e.g., 1024→1280 for a 16K segment), providing room to preserve specific identifiers the prompt already asks for (error messages, file paths, version numbers)
  • Increase tool output truncation limit (2K→4K): Lets the distillation LLM see actual error messages and stack traces instead of compact annotations — directly impacts questions about exact error text from tool outputs
  • Protect recent gen-0 segments from meta-distillation: Keeps the 5 most recent gen-0 segments un-archived (recentSegmentsToKeep config, default 5) so their full detail stays in the context prefix while older segments are consolidated by the reflector
  • Guarantee meta survives selectDistillations at layer 3: Always includes gen>=1 entries (consolidated session history) before filling remaining slots from gen-0 by recency+importance scoring. Without this, the meta would be evicted at layer 3 (distLimit=5) since 5 newer gen-0 segments outscore it on recency

Key Design Details

Partition logic: metaDistillInner() applies the threshold check to toConsolidate.length (not existing.length) to prevent kept segments from re-triggering consolidation on idle ticks — especially under bust pressure where effectiveMetaThreshold can drop to 5.

Meta preservation: selectDistillations() now separates meta (gen>=1) and gen-0 entries. Meta entries always get included first; remaining slots are filled from gen-0 by the existing recency+importance scoring. This ensures the consolidated session history is never dropped under compression pressure.

Tests

  • Updated distillTokenBudget assertions for MULTIPLIER=10
  • 4 new tests for recentSegmentsToKeep: first-round partition, re-trigger prevention, boundary case (count == keep), anchored round with partition
  • 5 new tests for selectDistillations meta preservation: basic passthrough, meta guaranteed inclusion, multiple metas, gen-0-only fallback, emergency limit=2
  • All 1727 tests pass, typecheck clean across all 4 packages

Three targeted changes to address information loss in long sessions:

1. Increase gen-0 budget multiplier (8→10): gives the observer LLM 25%
   more tokens per segment to preserve specific identifiers (error
   messages, file paths, version numbers).

2. Increase tool output truncation limit (2K→4K): lets the distillation
   LLM see actual error messages and stack traces instead of compact
   annotations.

3. Protect recent gen-0 segments from meta-distillation: keeps the 5
   most recent gen-0 segments un-archived so their full detail stays in
   the context prefix while older segments are consolidated.

The partition logic in metaDistillInner() uses the threshold check on
toConsolidate.length (not existing.length) to prevent kept segments
from re-triggering consolidation on idle ticks.

Closes #417
@BYK BYK self-assigned this May 20, 2026
When recentSegmentsToKeep keeps 5 gen-0 segments un-archived, the meta
distillation (gen>=1) sits at index 0 with the lowest recency score.
At layer 3 (distLimit=5), selectDistillations would evict the meta in
favor of 5 newer gen-0 segments, losing the consolidated session history.

Fix: always include gen>=1 entries first, then fill remaining slots
from gen-0 by recency+importance scoring. This ensures the meta is
never dropped under compression pressure.

Adds 5 tests for selectDistillations meta preservation.
@BYK BYK merged commit 14d6927 into main May 20, 2026
10 checks passed
@BYK BYK deleted the perf/distillation-detail-retention branch May 20, 2026 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: distillation loses specific identifiers at 400K+ token sessions

1 participant