BYK · BYK · May 20, 2026 · May 20, 2026
diff --git a/.lore.md b/.lore.md
@@ -5,20 +5,26 @@
 ### Architecture
 
 <!-- lore:019e25c5-5716-77a0-bcf9-d65f321e5736 -->
-* **3-layer gradient model: layer 2 is transient, falls back to layer 1 via urgent distillation**: 3-layer gradient model: Layer 0 (all raw messages + LTM, append-only cache). Layer 1 (distilled prefix + pinned raw window, bust once on entry then warm). Layer 2/emergency (transient hard reset: fresh LTM, 2-3 best distillations, current agentic turn — fires 1-2 turns then urgent distillation falls back to Layer 1). Layer 2 must NOT set stickiness — stickiness only applies to layers 1-3. Bug in \`gradient.ts\`: \`effectiveMinLayer = max(0, lastLayer)\` traps sessions in emergency indefinitely; fix: restrict stickiness to \`lastLayer >= 1 && lastLayer <= 3\`. Context budget caps (160K at Opus) are cost-driven. Layer-specific distLimit: layers 1-2 all non-archived distillations; layer 3: top 5 via \`selectDistillations()\`; emergency: top 2. Scoring: 70% recency + 30% \`importanceBonus()\`. Cache frozen during tool-call chains for byte-identical prefix.
+* **3-layer gradient model: layer 2 is transient, falls back to layer 1 via urgent distillation**: 3-layer gradient model in \`packages/core/src/gradient.ts\`. Layer 0: full passthrough. Layer 1 (\`strip:none\`): distilled prefix + stable raw window (byte-identical for cache). Layer 2 (\`strip:old-tools\`, rawFrac=0.50): strips tool outputs on messages older than last 2 turns. Layer 3 (\`strip:all-tools\`, rawFrac=0.55, distFrac=0.15, distLimit=5): strips all tool outputs except current turn. Layer 4 (emergency): top 2 distillations + 25% tail; never strips tool parts (would cause infinite tool-call loops). \`currentTurnStart()\` (line 2053) walks backward past tool-call chains to protect the entire active chain. \`tryFit()\` backward walk has no pair-keeping logic — safety via reconstruct-after-eviction \[\[see tool pairing entry]]. Bug: \`effectiveMinLayer = max(0, lastLayer)\` traps sessions in emergency; fix: restrict stickiness to \`lastLayer >= 1 && lastLayer <= 3\`.
 
 <!-- lore:019e3083-8969-732b-a269-72e6cdd1ff7d -->
 * **Background LLM rate limiting: p-limit(2) + 429 circuit breaker in background-limiter.ts**: Global concurrency limit for background LLM work in \`packages/gateway/src/background-limiter.ts\`. Uses \`p-limit(2)\` to cap simultaneous background LLM calls across all idle sessions. Circuit breaker trips on 429 responses and pauses all background work for the \`Retry-After\` duration. Wired into: idle scheduler, pipeline incremental distillation, in-flight curation. Urgent distillation is excluded (client is waiting). Without this, N idle sessions fire N×4 simultaneous background calls causing cascading rate limit failures.
 
 <!-- lore:019e1c62-3208-7836-a531-f92d1bb20733 -->
 * **Conversation import system: providers, detection, extraction pipeline**: Core import system lives in \`packages/core/src/import/\`. Key design: \`AgentHistoryProvider\` interface with \`detect()\`/\`load()\` methods; providers registered in a global registry (\`providers/index.ts\`). Detection scans all providers, returns \`DetectedSession\[]\`. Extraction calls curator LLM sequentially per chunk, deduplicating ops via \`parseOps()\`/\`applyOps()\`. Idempotency via \`import\_history\` table (DB migration v19). Built-in providers: Claude Code (\`~/.claude/projects/\`), OpenCode (SQLite), Aider (markdown), Codex (\`~/.codex/sessions/\` JSONL), Cline (VS Code globalStorage JSON), Continue (\`~/.continue/sessions/\` JSON), Pi (\`~/.pi/agent/sessions/\` tree-structured JSONL). Auto-import triggered in \`lore run\` via \`maybeAutoImport()\`. Copilot Chat skipped (opaque leveldb). The OpenCode plugin's \`reflect.ts\` (plugin-side recall tool) was dead code and has been removed — plugin uses \`tool: {}\` (empty), gateway handles all recall.
 
+<!-- lore:019e458b-fe1a-77fe-886b-37ef1817e7ca -->
+* **Gradient tool\_use/tool\_result pairing: reconstruct-after-eviction pattern**: Gradient eviction (\`tryFit()\`) has NO logic to keep \`tool\_use\`/\`tool\_result\` pairs together — it cuts at individual message boundaries. Safety is achieved via reconstruct-after-eviction: (1) \`resolveToolResults()\` (temporal-adapter.ts:239) merges tool result data onto assistant tool parts and strips user-side \`tool\_result\` parts before gradient runs; (2) \`loreMessagesToGateway()\` (pipeline.ts:3401) reconstructs \`tool\_use\`+\`tool\_result\` pairs from surviving assistant tool parts; (3) \`removeOrphanedToolResults()\` (pipeline.ts:3524) removes any remaining orphans as a safety net. \`sanitizeToolParts()\` (gradient.ts:1071) converts pending/running tool parts to error state to prevent API rejection. Layer 4 (emergency) never strips tool parts to avoid infinite tool-call loops.
+
 <!-- lore:019e30a6-ff62-723e-9fd8-56f1f1f60b5a -->
 * **LTM confidence field: semantic meaning and rerankPreferences() for legacy entries**: \`ltm.create()\` accepts optional \`confidence\` param (default 1.0, clamped \[0,1]). Confidence semantics: 1.0=unconditional directive, 0.9=strong preference, 0.8=moderate, 0.6=mild. \`CuratorOp\` create type includes \`confidence\`, wired through \`applyOps\`. \`rerankPreferences()\` in \`packages/core/src/ltm.ts\` re-scores legacy entries by directive keyword patterns (\`STRONG\_DIRECTIVE\_RE\` regex); skips entries whose \`confidence\` was already set to a non-default (custom) value — manual overrides are preserved. \`lore data rerank\` CLI command triggers re-ranking; also auto-runs after \`lore data recover\`. Run after deploying to fix existing preferences in DB.
 
 <!-- lore:019e1de2-7650-76c3-9416-40d905dd998f -->
 * **OpenAI streaming translation: stateful SSE translators in stream/openai.ts**: OpenAI streaming translation: stateful SSE translators in \`packages/gateway/src/stream/openai.ts\` and \`stream/openai-responses.ts\` consume Anthropic SSE events and emit OpenAI-format SSE events incrementally — clients see tokens as they arrive. The pipeline carries \`effectiveProtocol\` in \`UpstreamResult\` to dispatch to the right translator. All translators must implement: (1) \`cancelled\` flag + \`cancel()\` handler aborting upstream via \`AbortController\`; (2) \`safeEnqueue()\` wrapper that no-ops if \`cancelled\`; (3) error \`catch\` block emitting a protocol-appropriate terminal event (\`response.failed\` for Responses API, \`\[DONE]\` with error for Chat Completions) — otherwise clients hang. Adding a new upstream protocol requires both an accumulator branch and a streaming translator.
 
+<!-- lore:019e4586-82db-796f-a3cd-d65f790276bf -->
+* **OpenCode x-session-affinity is a per-process nanoid — not stable across restarts**: OpenCode generates \`x-session-affinity\` natively in its core binary (not via plugin API) as a nanoid — random, per-process. It does NOT persist across OpenCode restarts. The Lore plugin (\`packages/opencode/src/index.ts\`) never touches this header. \`input.sessionID\` in plugin hooks (e.g. \`chat.headers\`) is OpenCode's persistent DB \`Session.id\` — stable across restarts. These are different values. When OpenCode restarts, the new nanoid causes Tier 1 in \`identifySession()\` to create a brand-new Lore session, orphaning all prior distillations/gradient state. Fix: inject \`input.sessionID\` as \`x-lore-session-id\` in the \`chat.headers\` hook to give Lore a restart-stable identifier.
+
 <!-- lore:019e1952-398c-753f-b92c-4a5fa5ecf15f -->
 * **Pi plugin: which providers can be proxied through the gateway**: Pi plugin gateway proxy compatibility by wire protocol. \*\*Proxiable\*\*: \`anthropic\` → \`/v1/messages\`: \`anthropic\`, \`fireworks\`, \`github-copilot\`; \`openai-completions\` → \`/v1/chat/completions\`: \`deepseek\`, \`xai\`, \`groq\`, \`cerebras\`, \`openrouter\`, \`huggingface\`, \`opencode\`, \`opencode-go\`; \`openai-responses\` → \`/v1/responses\`: \`openai\`, \`azure-openai-responses\`, \`openai-codex\`, \`azure-openai\`, \`lm-studio\`, \`ollama\`. \*\*Cannot proxy\*\*: \`google\`, \`google-vertex\`, \`amazon-bedrock\`, \`mistral\`. \`registerProvider(name, { baseUrl })\` overrides base URL. Gateway routes by URL path only. OpenAI streaming clients receive true incremental SSE (\`stream/openai.ts\`).
 
@@ -51,6 +57,9 @@
 <!-- lore:019e2e20-95b3-7a9d-ab38-77d87eafecc4 -->
 * **splitSegments() infinite recursion on oversized single messages**: splitSegments() infinite recursion on oversized single messages: In \`packages/core/src/distillation.ts\`, \`splitSegments()\` recurses infinitely when a single message exceeds \`maxSegmentTokens\` (16384). \`findSplitIndex()\` returns \`messages.length\` (=1), so \`left = messages.slice(0, 1)\` produces an identical recursive call. Triggered on large tool outputs (~49KB+). Fix: add base case after the \`totalTokens <= maxTokens\` guard — \`if (messages.length <= 1) return \[messages]\`. The oversized message becomes an indivisible segment.
 
+<!-- lore:019e4586-82eb-7483-95e4-377b435e5e99 -->
+* **Tier 1 session identification blocks Tier 3 fingerprinting when known header changes**: Trap: When \`x-session-affinity\` changes (OpenCode restart), Tier 3 fingerprint matching looks like it should reconnect the session. Fix: Tier 1 in \`identifySession()\` (\`pipeline.ts\` ~line 928) is a first-match-wins gate — if ANY \`KNOWN\_SESSION\_HEADERS\` header is present but unrecognized in \`headerSessionIndex\`, it immediately creates a new session and returns \`{ isNew: true, tier: 1 }\`. Tier 3 is only reached when NO known session header is present at all. Result: OpenCode restart always produces a session split with no recovery path through fingerprinting. The only fix is injecting a stable identifier (e.g. \`input.sessionID\` as \`x-lore-session-id\`) so Tier 1 matches correctly.
+
 <!-- lore:019e1de2-7639-7b32-b4c1-e64486934c27 -->
 * **TTL downgrade hysteresis: downgradeStreak field prevents compounding cache busts**: Auto-TTL downgrade hysteresis in \`packages/gateway/src/pipeline.ts\`: downgrade from 1h→5m TTL requires 3 consecutive short-gap turns (\`ttlDowngradeStreak\` in \`SessionState\`). Block downgrade if >50% of session tokens are cached. Reset streak on any long-gap turn. Subagent turns and tool-use continuations excluded from gap recording — capture \`prevStopReason\` before line 1667 overwrites it, skip when \`prevStopReason === 'tool\_use'\` or \`isSubagentTurn\`. State persistence: immediate (session identity), per-turn (cost snapshot), 30s periodic (gradient EMAs + cache warming via dirty flag). Max data loss on crash: ~30s.
 
@@ -73,8 +82,8 @@
 <!-- lore:019e4425-f7b6-7f33-99dc-2e17087f251e -->
 * **Always ask for session history recall at the start of new sessions**: Session continuity: At session start, recall specific prior-session details (PR titles, branch names, test counts, file names, commit details, lint fixes) as precisely as possible. State explicitly when details were not captured. Proactively log granular session artifacts (exact error messages, PR titles, file paths, line numbers, commit hashes, test names, root causes) so future recall is reliable. User relies on this to resume context quickly. When information is unavailable, say so explicitly and suggest where to find it (GitHub, logs, commit history). Never fabricate or infer details as confirmed facts — always distinguish verified session records from generated/uncertain information.
 
-<!-- lore:019e4478-81dd-77bc-a3e7-931069606b23 -->
-* **Always carry forward workflow directives and incident context as persistent hard rules across sessions**: 🔴 HARD RULES (every session): 1. Two-phase Plan/Build: write plan file first, wait for approval, then implement. 2. Pre-merge self-review: rate all changed files by severity with exact paths/line numbers; fix all criticals; confirm CI green. 3. Eval fairness: demand deep fairness analysis before accepting eval results. 4. LTM cache delete must be inside 'changes made' guard — unconditional placement is a bug. 5. Re-state hard rules + completed work at session start with exact file paths, line numbers, commit hashes, PR URLs — flag missing details with 🟡. 6. Root cause first: RCA must precede any proposed solution. 7. Never run \`git remote -v\` with client-controlled cwd. 8. IDs starting with \`LOREAI-GATEWAY-\` are Sentry issue IDs. 9. Pull from origin/main before any codebase exploration or edits. 10. Resolve \`.lore.md\` rebase conflicts with \`--ours\`. 11. When multiple fixes identified, apply together and immediately re-run evals. 12. Before implementing features/debugging, explore relevant codebase area first. 13. Production incidents: acknowledge → investigate → hotfix → code fix → preventive measures. 14. 'Let's go with that plan.' = full approval, proceed immediately. 15. Pre-c \[truncated — entry too long]
+<!-- lore:019e4584-35d4-7028-b8e5-d6d02662a261 -->
+* **Always create a new branch from main before starting follow-up fixes after a PR merges**: After a PR is merged to main, the user consistently pulls the latest main and creates a new branch before beginning any follow-up work or fixes. This applies even when the follow-up is discovered immediately after the merge. The assistant should proactively suggest or execute: (1) pull/rebase from origin main, (2) create a new branch for the next set of changes, before making any edits. Never continue working on the old branch or commit follow-up fixes directly to main.
 
 <!-- lore:019e44c8-e3b2-70c1-afb6-d3acf24c531a -->
 * **Always fix cache memory leaks with TTL eviction, size cap, and scheduled pruning**: Cache memory leak fix pattern: (1) TTL check in \`.get()\` — delete and return undefined if \`Date.now() - entry.ts > this.ttlMs\`; (2) LRU eviction in \`.set()\` — delete oldest key when \`store.size >= maxEntries\`; (3) \`setInterval(() => this.prune(), 60\_000)\` in constructor. Defaults: \`maxEntries = 10\_000\`, \`ttlMs = 300\_000\` (5 min). Applies to all cache modules.
@@ -91,6 +100,9 @@
 <!-- lore:019e4422-5b29-77a8-8956-488233ef16a4 -->
 * **Always request critical code reviews with specific file paths, line numbers, and severity classifications**: Code review standard: provide exact file paths, line ranges, severity classifications (critical/medium/low), root causes, and concrete fix recommendations. Must-fix items called out explicitly before merge. Before merging any PR: (1) run critical self-review covering all changed files; (2) fix all criticals; (3) confirm CI green. Reviews must be skeptical — actively look for subtle bugs (state not cleared on fallback paths, consume-once flag semantics, circuit breaker bypass, concurrency edge cases). Produce explicit verdict alongside ranked findings. Before implementing features or debugging, read all named files deeply and report findings with precise references. Always analyze root causes before proposing solutions. When starting eval-related work, enumerate concrete gaps before proposing solutions. Track which evals have been run vs. pending. After root-cause analysis or bug fix, propose eval extensions covering the newly discovered failure mode. When presented with a GitHub issue, challenge unsubstantiated claims — verify against actual code.
 
+<!-- lore:019e4587-69a5-7c3a-a74f-e29e90afb5d5 -->
+* **Always request exact file paths, line numbers, and verbatim code snippets when investigating codebase behavior**: When asking for code investigation or analysis, the user consistently expects responses to include exact file paths (e.g., \`packages/gateway/src/session.ts\`), specific line numbers (e.g., 'lines 84-93'), and verbatim code snippets or field names. The user structures requests as multi-part investigations with numbered questions. Responses should never paraphrase or summarize code behavior without grounding it in precise source locations. When a finding is negative (e.g., 'zero references'), that should also be stated with the specific file and search scope examined.
+
 <!-- lore:019e44c8-4e3f-7835-972f-02ed2033a842 -->
 * **Always request worker tests with a consistent 7-case spec covering compute, missing-record, cleanup retention, and sync scenarios**: Worker test files follow a consistent 7-case spec: (1) compute job — DB lookup + update, (2) missing record — skip without throw, (3) cleanup — hard-delete records archived >30 days, (4) cleanup — preserve recently archived records, (5) sync — process a batch, (6) sync — skip missing records, (7) sync — respect dryRun flag. Tests mock DB and Redis. Applies uniformly across all worker modules.