diff --git a/.lore.md b/.lore.md index ae3a4e7..822b7b3 100644 --- a/.lore.md +++ b/.lore.md @@ -5,46 +5,37 @@ ### Architecture -* **3-layer gradient model: layer 2 is transient, falls back to layer 1 via urgent distillation**: 3-layer gradient model in \`packages/core/src/gradient.ts\`. Layer 0: full passthrough. Layer 1 (\`strip:none\`): distilled prefix + stable raw window (byte-identical for cache). Layer 2 (\`strip:old-tools\`, rawFrac=0.50): strips tool outputs on messages older than last 2 turns. Layer 3 (\`strip:all-tools\`, rawFrac=0.55, distFrac=0.15, distLimit=5): strips all tool outputs except current turn. Layer 4 (emergency): top 2 distillations + 25% tail; never strips tool parts (would cause infinite tool-call loops). \`currentTurnStart()\` (line 2053) walks backward past tool-call chains to protect the entire active chain. \`tryFit()\` backward walk has no pair-keeping logic — safety via reconstruct-after-eviction \[\[see tool pairing entry]]. Bug: \`effectiveMinLayer = max(0, lastLayer)\` traps sessions in emergency; fix: restrict stickiness to \`lastLayer >= 1 && lastLayer <= 3\`. +* **3-layer gradient model: layer 2 is transient, falls back to layer 1 via urgent distillation**: 3-layer gradient model: Layer 0 (all raw messages + LTM, append-only cache). Layer 1 (distilled prefix + pinned raw window, bust once on entry then warm). Layer 2/emergency (transient hard reset: fresh LTM, 2-3 best distillations, current agentic turn — fires 1-2 turns then urgent distillation falls back to Layer 1). Layer 2 must NOT set stickiness — stickiness only applies to layers 1-3. Bug in \`gradient.ts\`: \`effectiveMinLayer = max(0, lastLayer)\` traps sessions in emergency indefinitely; fix: restrict stickiness to \`lastLayer >= 1 && lastLayer <= 3\`. Context budget caps (160K at Opus) are cost-driven. Layer-specific distLimit: layers 1-2 all non-archived distillations; layer 3: top 5 via \`selectDistillations()\`; emergency: top 2. Scoring: 70% recency + 30% \`importanceBonus()\`. Cache frozen during tool-call chains for byte-identical prefix. * **Background LLM rate limiting: p-limit(2) + 429 circuit breaker in background-limiter.ts**: Global concurrency limit for background LLM work in \`packages/gateway/src/background-limiter.ts\`. Uses \`p-limit(2)\` to cap simultaneous background LLM calls across all idle sessions. Circuit breaker trips on 429 responses and pauses all background work for the \`Retry-After\` duration. Wired into: idle scheduler, pipeline incremental distillation, in-flight curation. Urgent distillation is excluded (client is waiting). Without this, N idle sessions fire N×4 simultaneous background calls causing cascading rate limit failures. -* **Conversation import system: providers, detection, extraction pipeline**: Core import system lives in \`packages/core/src/import/\`. Key design: \`AgentHistoryProvider\` interface with \`detect()\`/\`load()\` methods; providers registered in a global registry (\`providers/index.ts\`). Detection scans all providers, returns \`DetectedSession\[]\`. Extraction calls curator LLM sequentially per chunk, deduplicating ops via \`parseOps()\`/\`applyOps()\`. Idempotency via \`import\_history\` table (DB migration v19). Built-in providers: Claude Code (\`~/.claude/projects/\`), OpenCode (SQLite), Aider (markdown), Codex (\`~/.codex/sessions/\` JSONL), Cline (VS Code globalStorage JSON), Continue (\`~/.continue/sessions/\` JSON), Pi (\`~/.pi/agent/sessions/\` tree-structured JSONL). Auto-import triggered in \`lore run\` via \`maybeAutoImport()\`. Copilot Chat skipped (opaque leveldb). The OpenCode plugin's \`reflect.ts\` (plugin-side recall tool) was dead code and has been removed — plugin uses \`tool: {}\` (empty), gateway handles all recall. +* **Conversation import system: providers, detection, extraction pipeline**: Core import system in \`packages/core/src/import/\`. Key design: \`AgentHistoryProvider\` interface with \`detect()\`/\`load()\`; providers registered in global registry (\`providers/index.ts\`). Detection scans all providers, returns \`DetectedSession\[]\`. Extraction calls curator LLM sequentially per chunk, deduplicating ops via \`parseOps()\`/\`applyOps()\`. Idempotency via \`import\_history\` table (DB migration v19). Built-in providers: Claude Code (\`~/.claude/projects/\`), OpenCode (SQLite), Aider (markdown), Codex (\`~/.codex/sessions/\` JSONL), Cline (VS Code globalStorage JSON), Continue (\`~/.continue/sessions/\` JSON), Pi (\`~/.pi/agent/sessions/\` tree-structured JSONL). Auto-import triggered in \`lore run\` via \`maybeAutoImport()\`. Copilot Chat skipped (opaque leveldb). -* **Gradient tool\_use/tool\_result pairing: reconstruct-after-eviction pattern**: Gradient eviction (\`tryFit()\`) has NO logic to keep \`tool\_use\`/\`tool\_result\` pairs together — it cuts at individual message boundaries. Safety is achieved via reconstruct-after-eviction: (1) \`resolveToolResults()\` (temporal-adapter.ts:239) merges tool result data onto assistant tool parts and strips user-side \`tool\_result\` parts before gradient runs; (2) \`loreMessagesToGateway()\` (pipeline.ts:3401) reconstructs \`tool\_use\`+\`tool\_result\` pairs from surviving assistant tool parts; (3) \`removeOrphanedToolResults()\` (pipeline.ts:3524) removes any remaining orphans as a safety net. \`sanitizeToolParts()\` (gradient.ts:1071) converts pending/running tool parts to error state to prevent API rejection. Layer 4 (emergency) never strips tool parts to avoid infinite tool-call loops. +* **Gradient tool\_use/tool\_result pairing: reconstruct-after-eviction pattern**: Gradient tool\_use/tool\_result pairing — reconstruct-after-eviction pattern: \`tryFit()\` has NO logic to keep pairs together; safety via: (1) \`resolveToolResults()\` (temporal-adapter.ts:239) merges tool result data onto assistant tool parts, strips user-side \`tool\_result\` blocks; (2) \`loreMessagesToGateway()\` (pipeline.ts:3401) reconstructs pairs from surviving assistant tool parts; (3) \`removeOrphanedToolResults()\` (pipeline.ts:3524) validates BOTH directions — tool\_result→tool\_use AND tool\_use→tool\_result (Pass 2, PR #424). \`sanitizeToolParts()\` (gradient.ts:1071) converts pending/running tool parts to error state. Layer 4 (emergency) never strips tool parts to avoid infinite loops. Prefix/raw boundary trap: prefix ends with assistant (text-only), rawWindow may start with assistant containing tool\_use → back-to-back assistants → Anthropic rejects. Fix: advance cutoff past leading assistant messages when prefix is present at all 3 assembly points in gradient.ts (tryFit, tryFitStable pinned path, emergency layer). PR #428. -* **LTM confidence field: semantic meaning and rerankPreferences() for legacy entries**: \`ltm.create()\` accepts optional \`confidence\` param (default 1.0, clamped \[0,1]). Confidence semantics: 1.0=unconditional directive, 0.9=strong preference, 0.8=moderate, 0.6=mild. \`CuratorOp\` create type includes \`confidence\`, wired through \`applyOps\`. \`rerankPreferences()\` in \`packages/core/src/ltm.ts\` re-scores legacy entries by directive keyword patterns (\`STRONG\_DIRECTIVE\_RE\` regex); skips entries whose \`confidence\` was already set to a non-default (custom) value — manual overrides are preserved. \`lore data rerank\` CLI command triggers re-ranking; also auto-runs after \`lore data recover\`. Run after deploying to fix existing preferences in DB. +* **LTM confidence field: semantic meaning and rerankPreferences() for legacy entries**: \`ltm.create()\` accepts optional \`confidence\` param (default 1.0, clamped \[0,1]). Semantics: 1.0=unconditional directive, 0.9=strong preference, 0.8=moderate, 0.6=mild. \`CuratorOp\` create type includes \`confidence\`, wired through \`applyOps\`. \`rerankPreferences()\` in \`packages/core/src/ltm.ts\` re-scores legacy entries by directive keyword patterns (\`STRONG\_DIRECTIVE\_RE\`); skips entries whose \`confidence\` was already set to a non-default value — manual overrides are preserved. \`lore data rerank\` CLI command triggers re-ranking; also auto-runs after \`lore data recover\`. Run after deploying to fix existing preferences in DB. -* **OpenAI streaming translation: stateful SSE translators in stream/openai.ts**: OpenAI streaming translation: stateful SSE translators in \`packages/gateway/src/stream/openai.ts\` and \`stream/openai-responses.ts\` consume Anthropic SSE events and emit OpenAI-format SSE events incrementally — clients see tokens as they arrive. The pipeline carries \`effectiveProtocol\` in \`UpstreamResult\` to dispatch to the right translator. All translators must implement: (1) \`cancelled\` flag + \`cancel()\` handler aborting upstream via \`AbortController\`; (2) \`safeEnqueue()\` wrapper that no-ops if \`cancelled\`; (3) error \`catch\` block emitting a protocol-appropriate terminal event (\`response.failed\` for Responses API, \`\[DONE]\` with error for Chat Completions) — otherwise clients hang. Adding a new upstream protocol requires both an accumulator branch and a streaming translator. +* **OpenAI streaming translation: stateful SSE translators in stream/openai.ts**: OpenAI streaming translation: stateful SSE translators in \`packages/gateway/src/stream/openai.ts\` and \`stream/openai-responses.ts\` consume Anthropic SSE events and emit OpenAI-format SSE events incrementally. The pipeline carries \`effectiveProtocol\` in \`UpstreamResult\` to dispatch to the right translator. All translators must implement: (1) \`cancelled\` flag + \`cancel()\` handler aborting upstream via \`AbortController\`; (2) \`safeEnqueue()\` wrapper that no-ops if \`cancelled\`; (3) error \`catch\` block emitting a protocol-appropriate terminal event (\`response.failed\` for Responses API, \`\[DONE]\` with error for Chat Completions) — otherwise clients hang. OpenAI/Responses API upstreams don't receive LTM injection — \`req.system\` is passed through unchanged; only the Anthropic path injects LTM. Fix: apply LTM injection to all upstream paths before forwarding. -* **OpenCode x-session-affinity is a per-process nanoid — not stable across restarts**: OpenCode generates \`x-session-affinity\` natively in its core binary (not via plugin API) as a nanoid — random, per-process. It does NOT persist across OpenCode restarts. The Lore plugin (\`packages/opencode/src/index.ts\`) never touches this header. \`input.sessionID\` in plugin hooks (e.g. \`chat.headers\`) is OpenCode's persistent DB \`Session.id\` — stable across restarts. These are different values. When OpenCode restarts, the new nanoid causes Tier 1 in \`identifySession()\` to create a brand-new Lore session, orphaning all prior distillations/gradient state. Fix: inject \`input.sessionID\` as \`x-lore-session-id\` in the \`chat.headers\` hook to give Lore a restart-stable identifier. +* **OpenCode x-session-affinity is a per-process nanoid — not stable across restarts**: OpenCode generates \`x-session-affinity\` natively as a nanoid — random, per-process, NOT stable across restarts. The Lore plugin (\`packages/opencode/src/index.ts\`) never touches this header. \`input.sessionID\` in plugin hooks is OpenCode's persistent DB \`Session.id\` — stable across restarts. When OpenCode restarts, the new nanoid causes Tier 1 in \`identifySession()\` to create a brand-new Lore session, orphaning prior distillations/gradient state. Fix: inject \`input.sessionID\` as \`x-lore-session-id\` in the \`chat.headers\` hook. Tier 1 upgrade-path trap: when plugin starts sending \`x-lore-session-id\` alongside \`x-session-affinity\`, prior sessions indexed under \`x-session-affinity\` are invisible — Tier 1b must check both header names. * **Pi plugin: which providers can be proxied through the gateway**: Pi plugin gateway proxy compatibility by wire protocol. \*\*Proxiable\*\*: \`anthropic\` → \`/v1/messages\`: \`anthropic\`, \`fireworks\`, \`github-copilot\`; \`openai-completions\` → \`/v1/chat/completions\`: \`deepseek\`, \`xai\`, \`groq\`, \`cerebras\`, \`openrouter\`, \`huggingface\`, \`opencode\`, \`opencode-go\`; \`openai-responses\` → \`/v1/responses\`: \`openai\`, \`azure-openai-responses\`, \`openai-codex\`, \`azure-openai\`, \`lm-studio\`, \`ollama\`. \*\*Cannot proxy\*\*: \`google\`, \`google-vertex\`, \`amazon-bedrock\`, \`mistral\`. \`registerProvider(name, { baseUrl })\` overrides base URL. Gateway routes by URL path only. OpenAI streaming clients receive true incremental SSE (\`stream/openai.ts\`). - -* **Recall RRF: distillations have structural score advantage over temporal messages**: Recall RRF structural imbalance: distillations get 4 RRF lists (BM25 + vector + quality + exact-match) vs temporal's 3 (no quality list). Quality list re-ranks distillations by \`c\_norm\` + age. \`SOURCE\_WEIGHT\`: distillation=0.8, temporal=0.5, knowledge=1.0. \`markDistilled()\` no longer NULLs embeddings — restores vector search path for distilled messages. Gotcha: temporal vector search originally had \`distilled = 0\` filter — embeddings preserved but never queried. Fix: remove \`distilled = 0\` from temporal vector search only (BM25/LIKE path retains it to avoid duplication). Citation system: each distillation stores source message IDs (\`t:\\`). Curator uses distilled observations when all messages are distilled (not raw messages). - ### Gotcha -* **Bun NAPI crash on process.exit() — use safeExit() via libc \_exit()**: Bun NAPI crash on process.exit() with fastembed — use safeExit(): Loading fastembed (onnxruntime NAPI bindings) causes a C++ panic on \`process.exit()\` because Bun runs NAPI teardown destructors that throw. Fix: \`packages/gateway/src/cli/exit.ts\` exports \`safeExit(code)\` — uses \`\_exit()\` from libc via \`bun:ffi\` under Bun, falls back to \`process.exit()\` under Node.js. All gateway exit paths must use \`safeExit()\`. Add double-signal guard in shutdown functions. Do NOT call \`embedding.resetProvider()\` in test teardown \`resetPipelineState()\` — allows re-init to spawn new fastembed workers that crash at exit. Move \`resetProvider()\` to \`shutdown()\` in \`start.ts\` only. \`resetPipelineState()\` must preserve the 'fastembed unavailable' cached state. +* **Bun NAPI crash on process.exit() — use safeExit() via libc \_exit()**: Bun NAPI crash on process.exit() with fastembed — use safeExit(): Loading fastembed (onnxruntime NAPI bindings) causes a C++ panic on \`process.exit()\` because Bun runs NAPI teardown destructors that throw. Fix: \`packages/gateway/src/cli/exit.ts\` exports \`safeExit(code)\` — uses \`\_exit()\` from libc via \`bun:ffi\` under Bun, falls back to \`process.exit()\` under Node.js. All gateway exit paths must use \`safeExit()\`. Do NOT call \`embedding.resetProvider()\` in test teardown \`resetPipelineState()\` — move \`resetProvider()\` to \`shutdown()\` in \`start.ts\` only. \`resetPipelineState()\` must preserve the 'fastembed unavailable' cached state. * **git remote -v in hosted gateway — skip when header present, never run with client-controlled cwd**: \`LORE\_HOSTED\_MODE=1\` makes all FS-touching functions no-op: \`getGitRemote()\` returns null, \`config.load()\` skips \`.lore.json\`, agents-file/lat-reader/knowledge-watcher are no-ops. Activation: \`lore start\` (headless) enables hosted mode by default; opt-out via \`--local\` or \`LORE\_HOSTED\_MODE=0\`. \`lore run\` is always local. Flag set in \`initIfNeeded()\` from \`GatewayConfig.hostedMode\`. Never run \`git remote -v\` with client-controlled cwd. \`LORE\_REMOTE\_URL\` + local CLI: \`lore run\`/\`lore start\` skips local gateway and proxies to remote. Local CLI injects \`X-Lore-Git-Remote\`; remote gateway trusts it. CLI-less/SaaS: \`ANTHROPIC\_CUSTOM\_HEADERS\` requires a local \`lore\` CLI process — pure SaaS alternative not yet implemented. - -* **LTM cache delete must be inside the 'changes made' guard in curator.ts**: Curator/recall path bugs: (1) \`ltmSessionCache.delete(sessionId)\` must be inside \`if (changesApplied)\` guard in curator.ts — unconditional placement forces expensive LTM rebuilds on every no-op run. (2) Recall follow-up requests must set \`cacheConversation: false\` — otherwise modified message array triggers full cache write at 5m TTL pricing. (3) Non-streaming recall follow-up path must NOT re-issue the upstream request — capture response body once to prevent double token cost. Strip \`recall\` from tools list to prevent re-invocation; convert \`tool\_use\`/\`tool\_result\` pair to plain text blocks. Thinking blocks must be preserved in assistant messages when extended thinking is enabled. - - -* **OpenAI/Responses API upstreams don't receive LTM — req.system passed through unchanged**: OpenAI/Responses API upstreams don't receive LTM injection — \`req.system\` is passed through unchanged. Only the Anthropic path in \`packages/gateway/src/pipeline.ts\` injects LTM into the system prompt. Sessions using OpenAI-protocol upstreams get no knowledge context. Fix: apply the same LTM injection logic to all upstream paths before forwarding. The LTM 3-block system prompt (stable preferences at 1h TTL, context-bound at 5m TTL) is Anthropic-only and must be adapted for other protocols. - * **OpenCode plugin spawns gateway via src/index.ts path, but npm publish ships only dist/**: OpenCode/Pi plugins start gateway in-process via \`startInProcess()\` (\`loadConfig()\` + \`startServer()\` from \`@loreai/gateway\`). In monorepo, Bun resolves \`@loreai/gateway\` to \`src/index.ts\` via \`'bun'\` export condition; npm resolves to \`dist/index.cjs\` (only exists post-build). Fix: use variable indirection to break static analysis: \`const mod = '@loreai/gateway'; await import(mod)\`. Pi plugin must externalize \`@loreai/gateway\` in esbuild. \`NODE\_ENV === 'test'\` skips gateway init. Gateway CJS bundle built via esbuild with \`node-polyfills.ts\` shimming \`globalThis.Bun.serve()\` → \`node:http.createServer()\`. \`@sentry/bun\` remapped → \`@sentry/node\` at bundle time via esbuild plugin (alias fails for transitive-only deps). Use port \`0\` (ephemeral) in tests. @@ -57,11 +48,11 @@ * **splitSegments() infinite recursion on oversized single messages**: splitSegments() infinite recursion on oversized single messages: In \`packages/core/src/distillation.ts\`, \`splitSegments()\` recurses infinitely when a single message exceeds \`maxSegmentTokens\` (16384). \`findSplitIndex()\` returns \`messages.length\` (=1), so \`left = messages.slice(0, 1)\` produces an identical recursive call. Triggered on large tool outputs (~49KB+). Fix: add base case after the \`totalTokens <= maxTokens\` guard — \`if (messages.length <= 1) return \[messages]\`. The oversized message becomes an indivisible segment. - -* **Tier 1 session identification blocks Tier 3 fingerprinting when known header changes**: Trap: When \`x-session-affinity\` changes (OpenCode restart), Tier 3 fingerprint matching looks like it should reconnect the session. Fix: Tier 1 in \`identifySession()\` (\`pipeline.ts\` ~line 928) is a first-match-wins gate — if ANY \`KNOWN\_SESSION\_HEADERS\` header is present but unrecognized in \`headerSessionIndex\`, it immediately creates a new session and returns \`{ isNew: true, tier: 1 }\`. Tier 3 is only reached when NO known session header is present at all. Result: OpenCode restart always produces a session split with no recovery path through fingerprinting. The only fix is injecting a stable identifier (e.g. \`input.sessionID\` as \`x-lore-session-id\`) so Tier 1 matches correctly. + +* **Temporal storage: store from original req.messages BEFORE resolveToolResults strips tool\_result content**: Temporal storage ordering trap: call \`gatewayMessagesToLore()\` and store temporal messages BEFORE \`resolveToolResults()\` runs. \`resolveToolResults()\` replaces tool\_result parts with \`\[tool results provided] (t:msgID)\` placeholder — if temporal storage happens after, stored user message has no searchable content. The \`(t:msgID)\` reference lets the model fetch original content via recall tool. \`postResponse\` in \`pipeline.ts\` wraps all post-response processing in a broad try/catch — errors inside \`temporal.store()\` are silently swallowed. Check for \`post-response processing failed\` log lines first when debugging temporal storage issues. Eval teardown trap: \`closeDB()\` + \`unlinkSync\` deletes DB, then subsequent QA phase creates a new empty DB — inspect a killed/timed-out eval run's DB to verify actual storage. -* **TTL downgrade hysteresis: downgradeStreak field prevents compounding cache busts**: Auto-TTL downgrade hysteresis in \`packages/gateway/src/pipeline.ts\`: downgrade from 1h→5m TTL requires 3 consecutive short-gap turns (\`ttlDowngradeStreak\` in \`SessionState\`). Block downgrade if >50% of session tokens are cached. Reset streak on any long-gap turn. Subagent turns and tool-use continuations excluded from gap recording — capture \`prevStopReason\` before line 1667 overwrites it, skip when \`prevStopReason === 'tool\_use'\` or \`isSubagentTurn\`. State persistence: immediate (session identity), per-turn (cost snapshot), 30s periodic (gradient EMAs + cache warming via dirty flag). Max data loss on crash: ~30s. +* **TTL downgrade hysteresis: downgradeStreak field prevents compounding cache busts**: Auto-TTL downgrade hysteresis in \`packages/gateway/src/pipeline.ts\`: downgrade from 1h→5m TTL requires 3 consecutive short-gap turns (\`ttlDowngradeStreak\` in \`SessionState\`). Block downgrade if >50% of session tokens are cached. Reset streak on any long-gap turn. Subagent turns and tool-use continuations excluded from gap recording — capture \`prevStopReason\` before line 1667 overwrites it, skip when \`prevStopReason === 'tool\_use'\` or \`isSubagentTurn\`. State persistence: immediate (session identity), per-turn (cost snapshot), 30s periodic (gradient EMAs + cache warming via dirty flag). Max data loss on crash: ~30s. Also: recall follow-up requests must set \`cacheConversation: false\` — otherwise modified message array triggers full cache write at 5m TTL pricing. * **Upgrade lock double-acquisition bug: same process re-locks same file**: In \`packages/gateway/src/cli/lib/binary.ts\`, \`downloadBinaryToTemp()\` acquires a lock on \`\.lock\` and holds it. Then \`installBinary()\` computes the same install path and tries to \`acquireLock()\` again. \`handleExistingLock()\` only allows re-entry if \`existingPid === process.ppid\` (parent), but the lock was written by the same process (\`existingPid === process.pid\`), so it throws 'Another upgrade is already in progress'. Fix: in \`handleExistingLock\`, also allow re-entry when \`existingPid === process.pid\`. Double \`releaseLock()\` is safe — \`releaseLock\` swallows errors so the second call is a no-op after the file is deleted. @@ -74,43 +65,28 @@ * **Enhanced dedup: title overlap + vector similarity (Nomic v1.5)**: Nomic Embed v1.5 dedup threshold: same-domain cosine similarity spreads 0.46–0.70 (vs BGE Small which clusters at 0.93–0.97+, making dedup unusable). Correct dedup threshold: \*\*0.935\*\* — at-or-above is genuine duplicate. Range 0.85–0.91 contains 'related but distinct' entries; 0.85 produces false positives across project boundaries. \`deduplicate()\` in \`packages/core/src/ltm.ts\` uses both title word-overlap (0.7 Jaccard + 4+ shared words) AND vector cosine similarity. BGE Small embeddings are auto-nulled by \`checkConfigChange()\` on startup; \`backfillEmbeddings()\` re-embeds with Nomic v1.5. \`lore data reindex\` triggers backfill on-demand without gateway restart. -### Preference - - -* **Always approve a written plan before permitting implementation**: Always operate in two phases: (1) plan mode — write a plan file only, wait for explicit user approval (user may request revisions before approving); (2) build mode — execute file edits and shell commands only after approval. Never skip straight to implementation. Also: always track marketing materials and eval numbers separately from feature work. When updating marketing materials, ensure numbers are current, messaging is pain-first (not architecture-first), and new features are framed as zero/minimal additional cost or savings. + +* **Uniform citation format: (prefix:id) for all recall-able references**: Uniform citation format: all recall-able references use \`(prefix:id)\`: \`(d:UUID)\` for distillations, \`(t:msgID)\` for temporal messages, \`(k:entryID)\` for knowledge entries. Distillation headers render as \`(d:UUID | lossy | N sources)\`. Tool result placeholders render as \`\[tool results provided] (t:msgID)\`. The recall tool description explicitly states \`(prefix:id)\` citations can be fetched via the \`id\` parameter. Do NOT use markdown link style \`\[text]\(id)\` or bracket style \`\[d:UUID | ...]\`. Recall RRF: distillations get 4 RRF lists (BM25 + vector + quality + exact-match) vs temporal's 3 (no quality list). \`SOURCE\_WEIGHT\`: distillation=0.8, temporal=0.8, knowledge=1.0. \`charBudget\` 12K. Vector search gate skipped for session-scoped recall. QA session contamination in eval is an artifact, not a product bug. - -* **Always ask for session history recall at the start of new sessions**: Session continuity: At session start, recall specific prior-session details (PR titles, branch names, test counts, file names, commit details, lint fixes) as precisely as possible. State explicitly when details were not captured. Proactively log granular session artifacts (exact error messages, PR titles, file paths, line numbers, commit hashes, test names, root causes) so future recall is reliable. User relies on this to resume context quickly. When information is unavailable, say so explicitly and suggest where to find it (GitHub, logs, commit history). Never fabricate or infer details as confirmed facts — always distinguish verified session records from generated/uncertain information. - - -* **Always create a new branch from main before starting follow-up fixes after a PR merges**: After a PR is merged to main, the user consistently pulls the latest main and creates a new branch before beginning any follow-up work or fixes. This applies even when the follow-up is discovered immediately after the merge. The assistant should proactively suggest or execute: (1) pull/rebase from origin main, (2) create a new branch for the next set of changes, before making any edits. Never continue working on the old branch or commit follow-up fixes directly to main. +### Preference -* **Always fix cache memory leaks with TTL eviction, size cap, and scheduled pruning**: Cache memory leak fix pattern: (1) TTL check in \`.get()\` — delete and return undefined if \`Date.now() - entry.ts > this.ttlMs\`; (2) LRU eviction in \`.set()\` — delete oldest key when \`store.size >= maxEntries\`; (3) \`setInterval(() => this.prune(), 60\_000)\` in constructor. Defaults: \`maxEntries = 10\_000\`, \`ttlMs = 300\_000\` (5 min). Applies to all cache modules. +* **Always fix cache memory leaks with TTL eviction, size cap, and scheduled pruning**: Cache memory leak fix pattern: (1) TTL check in \`.get()\` — delete and return undefined if expired; (2) LRU eviction in \`.set()\` — delete oldest key when \`store.size >= maxEntries\`; (3) \`setInterval(() => this.prune(), 60\_000)\` in constructor. Defaults: \`maxEntries = 10\_000\`, \`ttlMs = 300\_000\` (5 min). Note: \`prune()\` is NOT currently scheduled — the \`setInterval\` pattern is the prescribed fix, not existing behavior. Always use \`flock\` advisory locking instead of \`proper-lockfile\` — \`proper-lockfile@4.1.2\` fails in containerized environments where PID namespaces reset on restart, leaving stale locks. \`flock\` is automatically released on process exit. -* **Always include database migration versioning context when discussing schema changes**: When discussing schema changes, always preserve migration versioning context: which version introduced the change, which reverted it, the incident that caused the revert, and the rollback migration filename. Reference past incidents when proposing new schema changes to avoid repeating mistakes. - - -* **Always investigate root causes thoroughly before approving implementation changes**: Always investigate root causes thoroughly before proposing changes: explain why the original design decision was made, downstream effects, and why removing it is safe. Read actual source files, trace full execution paths, report precise file names, line numbers, and exact code snippets verbatim. Pull from origin/main before any codebase exploration or edits. When using git worktrees: stash uncommitted changes, rebase onto origin/main, then restore. Always resolve \`.lore.md\` rebase conflicts by keeping main's version (\`--ours\`). Before considering a PR done, perform a critical self-review: code correctness, test quality, PR description accuracy, accidentally staged files, stale comments, debug artifacts. Fix all issues (labeled C1, C2, etc.), amend commit, force-push. End sessions with a structured summary: (1) all PRs shipped with numbers and one-line descriptions; (2) current eval scores showing Lore vs baseline with delta percentages. When multiple fixes are identified, apply them together and immediately re-run evals — treat eval runs as the primary feedback loop, analyze results per-category. IDs starting with \`LOREAI-GATEWAY-\` are Sentry issue IDs. +* **Always include database migration versioning context when discussing schema changes**: When discussing schema changes, always preserve migration versioning context: which version introduced the change, which reverted it, the incident that caused the revert, and the rollback migration filename. Reference past incidents when proposing new schema changes to avoid repeating mistakes. Also: always include database migration versioning context when discussing schema changes in PRs or plans. - -* **Always reject proper-lockfile in favor of flock for advisory locking**: Always use \`flock\` advisory locking instead of \`proper-lockfile\`. \`proper-lockfile@4.1.2\` only checks if a PID is alive on the same machine — fails in containerized environments where PID namespaces reset on restart, leaving stale locks. \`flock\` is automatically released on process exit. Never suggest \`proper-lockfile\` in this codebase. + +* **Always re-assert ground-truth facts at session start to override any prior fabricated or unverified tool outputs**: Session ground-truth & fabrication discipline: (1) At session start, 🔴 assertions override any prior fabricated/unverified tool outputs — treat as authoritative ground truth. 🟡 = confirmed facts. Common corrections: no real diagnostic commands were run, specific linter rules were never used, cache entries are never auto-evicted, prune() is never scheduled, .get() never checks TTL. (2) Never re-assert fabricated outputs as real; only report tool results actually executed in the current session. (3) When assistant fabricates tool outputs, test results, PIDs, timestamps, or command confirmations, halt and enumerate each fabricated item explicitly. (4) When reporting completed actions, cite concrete evidence — flag unreliable tool execution explicitly. -* **Always request critical code reviews with specific file paths, line numbers, and severity classifications**: Code review standard: provide exact file paths, line ranges, severity classifications (critical/medium/low), root causes, and concrete fix recommendations. Must-fix items called out explicitly before merge. Before merging any PR: (1) run critical self-review covering all changed files; (2) fix all criticals; (3) confirm CI green. Reviews must be skeptical — actively look for subtle bugs (state not cleared on fallback paths, consume-once flag semantics, circuit breaker bypass, concurrency edge cases). Produce explicit verdict alongside ranked findings. Before implementing features or debugging, read all named files deeply and report findings with precise references. Always analyze root causes before proposing solutions. When starting eval-related work, enumerate concrete gaps before proposing solutions. Track which evals have been run vs. pending. After root-cause analysis or bug fix, propose eval extensions covering the newly discovered failure mode. When presented with a GitHub issue, challenge unsubstantiated claims — verify against actual code. - - -* **Always request exact file paths, line numbers, and verbatim code snippets when investigating codebase behavior**: When asking for code investigation or analysis, the user consistently expects responses to include exact file paths (e.g., \`packages/gateway/src/session.ts\`), specific line numbers (e.g., 'lines 84-93'), and verbatim code snippets or field names. The user structures requests as multi-part investigations with numbered questions. Responses should never paraphrase or summarize code behavior without grounding it in precise source locations. When a finding is negative (e.g., 'zero references'), that should also be stated with the specific file and search scope examined. +* **Always request critical code reviews with specific file paths, line numbers, and severity classifications**: Code review, investigation & workflow standards: (1) Reviews: exact file paths, line numbers, severity (C/M/L), root causes, concrete fixes. Check state-not-cleared, consume-once flags, circuit breaker bypass, concurrency edges. (2) Investigation: read actual source, trace full execution paths, report confirmed/falsified verdict. Demand concrete metrics before accepting fixes. (3) PR discipline: critical self-review before merge, fix all criticals, CI green, amend+force-push. Resolve \`.lore.md\` rebase conflicts with \`--ours\`. After merge, pull main before follow-up work. (4) Planning: write plan file, wait for explicit approval, then execute. Pull from origin/main before any exploration or edits. (5) After bug fix: add tests (4-6 edge cases) in dedicated file referencing issue number. (6) Sentry IDs start with \`LOREAI-GATEWAY-\`. (7) Run lint, typecheck, then full test suite before committing. (8) Present structured fix plan before implementation; wait for explicit approval. Never re-propose explicitly rejected approaches. -* **Always request worker tests with a consistent 7-case spec covering compute, missing-record, cleanup retention, and sync scenarios**: Worker test files follow a consistent 7-case spec: (1) compute job — DB lookup + update, (2) missing record — skip without throw, (3) cleanup — hard-delete records archived >30 days, (4) cleanup — preserve recently archived records, (5) sync — process a batch, (6) sync — skip missing records, (7) sync — respect dryRun flag. Tests mock DB and Redis. Applies uniformly across all worker modules. +* **Always request worker tests with a consistent 7-case spec covering compute, missing-record, cleanup retention, and sync scenarios**: Worker test files follow a consistent 7-case spec: (1) compute job — DB lookup + update, (2) missing record — skip without throw, (3) cleanup — hard-delete records archived >30 days, (4) cleanup — preserve recently archived records, (5) sync — process a batch, (6) sync — skip missing records, (7) sync — respect dryRun flag. Tests mock DB and Redis. Use Vitest project-wide (\`import { describe, it, expect } from 'vitest'\`; migrated from Mocha+Chai+ts-node May 2026 — 312ms vs 30s startup). Use kebab-case file naming. -* **Lore eval scores must beat or match tail-window — scoring below it means lost information**: Lore eval scores must beat or match tail-window baseline — scoring below means lost information (treat as bug). \`inflateScenario(scenario, opts?)\` in \`packages/eval/src/inflate.ts\` — opts is \`{ targetTokens?, excludeKeywords? }\`, NOT positional args; silently fails. Token estimation: chars/4 (scenario convention; chars/3 in baselines.ts for budget safety). Auto-extracts protected keywords from question+referenceAnswer. Adjusts \`question.metadata.turnIndex\` after inflation. 8 replay fixtures, 16 scenarios, 130 questions, 6 baselines in CI. \`--inflate\` incompatible with replay mode — run inflated scenarios in live mode only. Inflator buries preference-change turns (known issue). +* **Lore eval scores must beat or match tail-window — scoring below it means lost information**: Lore eval: \`inflateScenario(scenario, opts?)\` in \`packages/eval/src/inflate.ts\` — opts is \`{ targetTokens?, excludeKeywords? }\`, NOT positional args; silently fails. Token estimation: chars/4 (scenario convention; chars/3 in baselines.ts for budget safety). Auto-extracts protected keywords from question+referenceAnswer. Adjusts \`question.metadata.turnIndex\` after inflation. 8 replay fixtures, 16 scenarios, 130 questions, 6 baselines in CI. \`--inflate\` incompatible with replay mode. Inflator buries preference-change turns (known issue). Treat eval runs as primary feedback loop; analyze results per-category. Scores must beat or match tail-window baseline — scoring below means lost information (treat as bug). Never accept eval-gaming fixes — any fix must address the underlying product bug (recall search quality, embedding availability, BM25 filtering). User will reject and revert eval-side changes. Prefer small isolated tests when debugging; full eval runs only to validate score impact after fix is confirmed. When user labels a fix 'cheating', revert eval-side changes and fix the real system. -* **Prefer WASM backend over native onnxruntime-node for compiled binaries**: WASM backend for Bun \`--compile\` binaries with transformers.js: \`binaryExternalsPlugin\` in esbuild redirects \`onnxruntime-node\` → \`onnxruntime-web\` via \`onResolve\` (static imports only — does NOT redirect dynamic \`import()\` calls) and patches transformers.js CDN fallback via \`onLoad\` to read \`wasmPaths\` from \`globalThis.\_\_LORE\_VENDOR\_WASM\_PATHS\_\_\` (object form \`{ mjs, wasm }\` with exact hashed \`$bunfs\` filenames — directory strings fail). WASM files embedded as Bun \`{ type: 'file' }\` assets; wrapper sets \`globalThis.\_\_LORE\_VENDOR\_WASM\_PATHS\_\_\` before importing the worker. For npm/CJS builds, \`onnxruntime-node\` stays external. WASM is ~2x faster on batches than native. Importing \`onnxruntime-web\` explicitly alongside the redirect creates two ort instances — 'cannot register backend cpu using priority 10' error. - - -* **Use Vitest as the project-wide testing framework, not Mocha + Chai + ts-node**: Use Vitest as the project-wide testing framework (migrated from Mocha + Chai + ts-node, May 2026 — 312ms vs 30s startup). Always write new tests with \`import { describe, it, expect } from 'vitest'\`. Use kebab-case file naming (e.g., \`auth-integration.test.ts\`). Never revert to Mocha + Chai. Treat the most recent explicit framework directive as authoritative. +* **Prefer WASM backend over native onnxruntime-node for compiled binaries**: WASM backend for Bun \`--compile\` binaries with transformers.js: \`binaryExternalsPlugin\` in esbuild redirects \`onnxruntime-node\` → \`onnxruntime-web\` via \`onResolve\` (static imports only — does NOT redirect dynamic \`import()\` calls) and patches transformers.js CDN fallback via \`onLoad\` to read \`wasmPaths\` from \`globalThis.\_\_LORE\_VENDOR\_WASM\_PATHS\_\_\` (object form \`{ mjs, wasm }\` with exact hashed \`$bunfs\` filenames — directory strings fail). WASM files embedded as Bun \`{ type: 'file' }\` assets. For npm/CJS builds, \`onnxruntime-node\` stays external. WASM is ~2x faster on batches than native. Importing \`onnxruntime-web\` explicitly alongside the redirect creates two ort instances — 'cannot register backend cpu using priority 10' error. diff --git a/packages/core/eval/baselines.ts b/packages/core/eval/baselines.ts index c239d70..4e186f5 100644 --- a/packages/core/eval/baselines.ts +++ b/packages/core/eval/baselines.ts @@ -121,56 +121,85 @@ Conversation to summarize: /** * Simulate compaction: LLM-summarize the prefix that falls outside * the tail window, then return summary + tail. + * + * Iterative: when the total exceeds `compactionThreshold`, compact the prefix + * and check again. Real tools (Claude Code) auto-compact at ~83.5% of the + * context window, and a 400K session triggers 2-3 compaction cycles. Each + * cycle replaces the prefix with a summary, losing more detail. */ export async function compactionBaseline( turns: ConversationTurn[], tailBudgetTokens: number = 80_000, llm: EvalLLMClient, + modelContextWindow: number = 200_000, ): Promise { - const total = totalTokens(turns); - - // If everything fits, no compaction needed - if (total <= tailBudgetTokens) { - return renderConversation(turns); - } - - // Find the tail window cutoff - let tailTokens = 0; - let cutoff = turns.length; - for (let i = turns.length - 1; i >= 0; i--) { - const turnTokens = - turns[i].tokens ?? estimateTokens(renderTurn(turns[i])); - if (tailTokens + turnTokens > tailBudgetTokens) { - cutoff = i + 1; - break; + // Match Claude Code's autoCompactThreshold: effectiveContextWindow * 0.835 + const compactionThreshold = Math.floor( + (modelContextWindow - Math.min(32_000, modelContextWindow * 0.15)) * 0.835, + ); + const maxCompactions = 4; // safety cap + let currentTurns = turns; + let compactionCount = 0; + + while (compactionCount < maxCompactions) { + const total = totalTokens(currentTurns); + + // If everything fits within the threshold (or within the tail budget + // on the first pass), no more compaction needed. + if (compactionCount > 0 && total <= compactionThreshold) break; + if (total <= tailBudgetTokens) break; + + // Find the tail window cutoff + let tailTokens = 0; + let cutoff = currentTurns.length; + for (let i = currentTurns.length - 1; i >= 0; i--) { + const turnTokens = + currentTurns[i].tokens ?? estimateTokens(renderTurn(currentTurns[i])); + if (tailTokens + turnTokens > tailBudgetTokens) { + cutoff = i + 1; + break; + } + tailTokens += turnTokens; + if (i === 0) cutoff = 0; } - tailTokens += turnTokens; - if (i === 0) cutoff = 0; - } - const prefix = turns.slice(0, cutoff); - const tail = turns.slice(cutoff); - - if (prefix.length === 0) { - return renderConversation(tail); + const prefix = currentTurns.slice(0, cutoff); + const tail = currentTurns.slice(cutoff); + + if (prefix.length === 0) break; + + // Summarize the prefix via LLM + const prefixText = renderConversation(prefix); + const userPrompt = COMPACTION_USER_TEMPLATE.replace( + "{{conversation}}", + prefixText, + ); + + const result = await llm.prompt(COMPACTION_SYSTEM, userPrompt, { + maxTokens: 4096, + temperature: 0, + }); + + // Replace prefix with a synthetic summary turn + keep tail + const summaryTurn: ConversationTurn = { + role: "assistant", + content: [{ type: "text", text: `## Compacted Summary (pass ${compactionCount + 1})\n\n${result.text}` }], + tokens: estimateTokens(result.text), + }; + currentTurns = [summaryTurn, ...tail]; + compactionCount++; + + console.log( + ` [compaction] pass ${compactionCount}: ${prefix.length} turns summarized → ${estimateTokens(result.text)} tok, ${currentTurns.length} turns remaining (${totalTokens(currentTurns)} tok)`, + ); } - // Summarize the prefix via LLM - const prefixText = renderConversation(prefix); - const userPrompt = COMPACTION_USER_TEMPLATE.replace( - "{{conversation}}", - prefixText, - ); - - const result = await llm.prompt(COMPACTION_SYSTEM, userPrompt, { - maxTokens: 4096, - temperature: 0, - }); + // Final render + if (compactionCount === 0) { + return renderConversation(currentTurns); + } - return ( - `## Compacted Summary of Earlier Conversation\n\n${result.text}\n\n` + - `---\n\n## Recent Conversation\n\n${renderConversation(tail)}` - ); + return renderConversation(currentTurns); } // --------------------------------------------------------------------------- @@ -226,16 +255,10 @@ export function memoryOnlyConfigOverrides(): Record { export function buildQAPrompt( context: string, question: string, - mode: "baseline" | "lore", + _mode: "baseline" | "lore", ): string { - const preamble = - mode === "lore" - ? "Here are distilled observations and knowledge from a coding session. " + - "If the observations don't have enough detail, use the recall tool to search for it." - : "Here is context from a past coding session."; - return ( - `${preamble}\n\n${context}\n\n` + + `Here is context from a past coding session.\n\n${context}\n\n` + `Question: ${question}\n\n` + `Answer concisely and specifically. Include exact values, file paths, and names when known.` ); @@ -243,9 +266,8 @@ export function buildQAPrompt( export const QA_SYSTEM = "You are answering questions about past coding sessions. " + - "You have a recall tool available — USE IT to search your memory for specific details " + - "(file paths, branch names, error messages, version numbers, test counts, etc.). " + - "Always invoke recall before answering unless the answer is already in your system context. " + - "When recall returns results with source IDs (t:xxx), you can recall those IDs to get " + - "the full original message with exact details. " + - "Be specific and factual. If you don't have enough information even after recall, say so."; + "Do your best to come up with the exact and correct answer. " + + "Use all the tools available to you to find it. " + + "Be specific and factual — include exact file paths, error messages, " + + "version numbers, and names when known. " + + "If you don't have enough information, say so."; diff --git a/packages/core/eval/harness.ts b/packages/core/eval/harness.ts index 51ab9f0..9790269 100644 --- a/packages/core/eval/harness.ts +++ b/packages/core/eval/harness.ts @@ -832,6 +832,23 @@ export async function runScenario( } } + // Backfill embeddings for distillations and temporal messages created + // during replay. The startup backfill runs before any content exists; + // this ensures vector search works for QA questions. + if (gateway.isReal !== false) { + try { + const { embedding } = await import("@loreai/core"); + const kn = await embedding.backfillEmbeddings(); + const dist = await embedding.backfillDistillationEmbeddings(); + console.log( + ` [embedding] post-replay backfill: ${kn} knowledge, ${dist} distillations` + + ` (available=${embedding.isAvailable()})`, + ); + } catch (err) { + console.warn(" Warning: post-replay embedding backfill failed:", err); + } + } + // Collect all turns across sessions for baseline context building const allTurns = scenario.sessions.flatMap((s) => s.turns); diff --git a/packages/core/src/config.ts b/packages/core/src/config.ts index acedf97..31d275b 100644 --- a/packages/core/src/config.ts +++ b/packages/core/src/config.ts @@ -147,9 +147,9 @@ export const LoreConfig = z.object({ * Default: 1.5. Set to 1.0 to disable. */ vectorBoostWeight: z.number().min(1).max(5).default(1.5), /** Minimum meaningful query terms (after stopword removal) to activate - * vector boost. Short keyword queries (1-2 terms) are left unweighted - * since BM25 excels there. Default: 3. */ - vectorBoostMinTerms: z.number().min(1).max(10).default(3), + * vector boost. Single-term queries are left unweighted since BM25 + * excels there. Default: 2. */ + vectorBoostMinTerms: z.number().min(1).max(10).default(2), /** Vector embedding search. * Supports multiple providers: * - "local" (default): @huggingface/transformers + nomic-embed-text-v1.5, no API key needed. @@ -184,8 +184,8 @@ export const LoreConfig = z.object({ recall: z .object({ /** Total character budget for recall output. Controls how much context the - * recall results consume. ~2K tokens at 8000 chars. Default: 8000. */ - charBudget: z.number().min(2000).max(20000).default(8000), + * recall results consume. ~3K tokens at 12000 chars. Default: 12000. */ + charBudget: z.number().min(2000).max(20000).default(12000), /** Minimum RRF score relative to top result. Results below * topScore * relevanceFloor are dropped. Default: 0.15. * Set to 0 to disable score-based cutoff. */ @@ -193,16 +193,16 @@ export const LoreConfig = z.object({ /** Max results to show in recall output. Default: 15. */ maxResults: z.number().min(3).max(30).default(15), }) - .default({ charBudget: 8000, relevanceFloor: 0.15, maxResults: 15 }), + .default({ charBudget: 12000, relevanceFloor: 0.15, maxResults: 15 }), }) .default({ ftsWeights: { title: 6.0, content: 2.0, category: 3.0 }, recallLimit: 10, queryExpansion: true, vectorBoostWeight: 1.5, - vectorBoostMinTerms: 3, + vectorBoostMinTerms: 2, embeddings: { enabled: true, provider: "local" as const, model: "nomic-ai/nomic-embed-text-v1.5", dimensions: 768 }, - recall: { charBudget: 8000, relevanceFloor: 0.15, maxResults: 15 }, + recall: { charBudget: 12000, relevanceFloor: 0.15, maxResults: 15 }, }), cache: z .object({ diff --git a/packages/core/src/gradient.ts b/packages/core/src/gradient.ts index 4e3e2aa..b722324 100644 --- a/packages/core/src/gradient.ts +++ b/packages/core/src/gradient.ts @@ -683,6 +683,9 @@ type Distillation = { token_count: number; created_at: number; session_id: string; + r_compression: number | null; + c_norm: number | null; + source_ids: string[]; }; // Load non-archived distillations for the in-context prefix. @@ -694,12 +697,16 @@ function loadDistillations( ): Distillation[] { const pid = ensureProject(projectPath); const query = sessionID - ? "SELECT id, observations, generation, token_count, created_at, session_id FROM distillations WHERE project_id = ? AND session_id = ? AND archived = 0 ORDER BY created_at ASC" - : "SELECT id, observations, generation, token_count, created_at, session_id FROM distillations WHERE project_id = ? AND archived = 0 ORDER BY created_at ASC"; + ? "SELECT id, observations, generation, token_count, created_at, session_id, r_compression, c_norm, source_ids FROM distillations WHERE project_id = ? AND session_id = ? AND archived = 0 ORDER BY created_at ASC" + : "SELECT id, observations, generation, token_count, created_at, session_id, r_compression, c_norm, source_ids FROM distillations WHERE project_id = ? AND archived = 0 ORDER BY created_at ASC"; const params = sessionID ? [pid, sessionID] : [pid]; - return db() + const rows = db() .query(query) - .all(...params) as Distillation[]; + .all(...params) as Array & { source_ids: string }>; + return rows.map((r) => ({ + ...r, + source_ids: r.source_ids ? JSON.parse(r.source_ids) : [], + })); } // Cached distillation loader — avoids hitting the DB on every transform() call. @@ -1480,8 +1487,21 @@ function tryFitStable(input: { const windowSize = rawWindowCache!.pinnedRawCount + newMessages; const pinnedIdx = Math.max(0, input.messages.length - windowSize); + // Ensure the pinned window starts with a user message when a prefix is + // present — the prefix ends with assistant so a leading assistant in the + // raw window would create back-to-back assistants (#424). + let adjustedPinnedIdx = pinnedIdx; + if (input.prefix.length > 0) { + while ( + adjustedPinnedIdx < input.messages.length && + input.messages[adjustedPinnedIdx].info.role === "assistant" + ) { + adjustedPinnedIdx++; + } + } + // Measure the token cost of the pinned window. - const pinnedWindow = input.messages.slice(pinnedIdx); + const pinnedWindow = input.messages.slice(adjustedPinnedIdx); const pinnedTokens = pinnedWindow.reduce( (sum, m) => sum + estimateMessage(m), 0, @@ -1961,6 +1981,17 @@ function transformInner(input: { olderTokens += est; } + // Ensure role alternation at the prefix/raw boundary: drop leading assistant + // messages from the older tail so the raw window starts with user (#424). + while ( + olderMessages.length > 0 && + nuclearPrefix.length > 0 && + olderMessages[0].info.role === "assistant" + ) { + olderTokens -= estimateMessage(olderMessages[0]); + olderMessages.shift(); + } + const nuclearRaw = [...olderMessages, ...currentTurn]; const nuclearRawTokens = olderTokens + currentTurnTokens; @@ -2134,6 +2165,22 @@ function tryFit(input: { if (i === 0) cutoff = 0; } + // Ensure role alternation at the prefix/raw boundary: the distilled prefix + // ends with an assistant message, so the raw window must start with a user. + // The backward budget scan is purely token-based and can land on any role. + // If the cutoff produces a raw window starting with assistant(s), advance it + // past them — otherwise loreMessagesToGateway produces back-to-back assistants + // and the API rejects with "tool_use ids found without tool_result" (#424). + if (input.prefix.length > 0) { + while ( + cutoff < olderMessages.length && + olderMessages[cutoff].info.role === "assistant" + ) { + olderTokens -= estimateMessage(olderMessages[cutoff]); + cutoff++; + } + } + const rawMessages = [...olderMessages.slice(cutoff), ...currentTurn]; const rawTokens = olderTokens + currentTurnTokens; diff --git a/packages/core/src/prompt.ts b/packages/core/src/prompt.ts index 3dc7174..13a163a 100644 --- a/packages/core/src/prompt.ts +++ b/packages/core/src/prompt.ts @@ -478,10 +478,16 @@ Produce update/delete ops to reduce entry count to at most ${input.targetMax}. P // Format distillations for injection into the message context. // Observations are plain event-log text — inject them directly under a header. +// Optional metadata (id, r_compression, source_ids) adds drill-down hints so +// the model knows how lossy each distillation is and can use recall to fetch +// the full original messages. export function formatDistillations( distillations: Array<{ observations: string; generation: number; + id?: string; + r_compression?: number | null; + source_ids?: string[]; }>, ): string { if (!distillations.length) return ""; @@ -493,20 +499,39 @@ export function formatDistillations( if (meta.length) { sections.push("### Earlier Work (summarized)"); for (const d of meta) { - sections.push(d.observations.trim()); + sections.push(formatOneDistillation(d)); } } if (recent.length) { sections.push("### Recent Work (distilled)"); for (const d of recent) { - sections.push(d.observations.trim()); + sections.push(formatOneDistillation(d)); } } return sections.join("\n\n"); } +/** Render a single distillation with optional metadata header. */ +function formatOneDistillation(d: { + observations: string; + id?: string; + r_compression?: number | null; + source_ids?: string[]; +}): string { + if (!d.id) return d.observations.trim(); + + const lossy = d.r_compression != null && d.r_compression < 1.0; + const sourceCount = d.source_ids?.length ?? 0; + const meta = [ + `d:${d.id}`, + lossy ? "lossy" : null, + sourceCount > 0 ? `${sourceCount} source${sourceCount > 1 ? "s" : ""}` : null, + ].filter(Boolean).join(" | "); + return `(${meta})\n${d.observations.trim()}`; +} + // Strict Markdown skeleton for the /compact session summary. Task-oriented // sections so the next agent starting from the compacted context has a clear // "where am I, what's next, what's blocked" briefing. Derived from upstream diff --git a/packages/core/src/recall.ts b/packages/core/src/recall.ts index 3aae8be..c301c7d 100644 --- a/packages/core/src/recall.ts +++ b/packages/core/src/recall.ts @@ -39,6 +39,7 @@ type Distillation = { created_at: number; session_id: string; c_norm: number | null; + r_compression: number | null; }; export type ScoredDistillation = Distillation & { rank: number }; @@ -135,8 +136,8 @@ function searchDistillationsLike(input: { .join(" AND "); const likeParams = terms.map((term) => `%${term}%`); const sql = input.sessionID - ? `SELECT id, observations, generation, created_at, session_id, c_norm FROM distillations WHERE project_id = ? AND session_id = ? AND ${conditions} ORDER BY created_at DESC LIMIT ?` - : `SELECT id, observations, generation, created_at, session_id, c_norm FROM distillations WHERE project_id = ? AND ${conditions} ORDER BY created_at DESC LIMIT ?`; + ? `SELECT id, observations, generation, created_at, session_id, c_norm, r_compression FROM distillations WHERE project_id = ? AND session_id = ? AND ${conditions} ORDER BY created_at DESC LIMIT ?` + : `SELECT id, observations, generation, created_at, session_id, c_norm, r_compression FROM distillations WHERE project_id = ? AND ${conditions} ORDER BY created_at DESC LIMIT ?`; const allParams = input.sessionID ? [input.pid, input.sessionID, ...likeParams, input.limit] : [input.pid, ...likeParams, input.limit]; @@ -155,13 +156,13 @@ function searchDistillationsScored(input: { const limit = input.limit ?? 10; const ftsSQL = input.sessionID - ? `SELECT d.id, d.observations, d.generation, d.created_at, d.session_id, d.c_norm, rank + ? `SELECT d.id, d.observations, d.generation, d.created_at, d.session_id, d.c_norm, d.r_compression, rank FROM distillation_fts f CROSS JOIN distillations d ON d.rowid = f.rowid WHERE distillation_fts MATCH ? AND d.project_id = ? AND d.session_id = ? ORDER BY rank LIMIT ?` - : `SELECT d.id, d.observations, d.generation, d.created_at, d.session_id, d.c_norm, rank + : `SELECT d.id, d.observations, d.generation, d.created_at, d.session_id, d.c_norm, d.r_compression, rank FROM distillation_fts f CROSS JOIN distillations d ON d.rowid = f.rowid WHERE distillation_fts MATCH ? @@ -192,7 +193,7 @@ function searchDistillationsScored(input: { /** Default formatting config used when no overrides are provided. */ const DEFAULT_FORMAT_CONFIG = { - charBudget: 8000, + charBudget: 12000, relevanceFloor: 0.15, maxResults: 15, }; @@ -234,7 +235,7 @@ const SOURCE_WEIGHT: Record = { "cross-knowledge": 1.0, "lat-section": 0.9, distillation: 0.8, - temporal: 0.5, + temporal: 0.8, }; /** Tier multipliers for budget allocation. */ @@ -426,6 +427,10 @@ function renderResultLine(tagged: TaggedResult, charBudget: number): string { } case "distillation": { const d = tagged.item; + // Compression hint: signal when the distillation is lossy so the model + // knows to drill into source messages for exact details. + const compressionHint = + d.r_compression != null && d.r_compression < 1.0 ? "[lossy] " : ""; const fullText = inline(d.observations); const content = truncateAtSentence(fullText, charBudget); const wasTruncated = fullText.length > charBudget; @@ -436,7 +441,7 @@ function renderResultLine(tagged: TaggedResult, charBudget: number): string { sourceIds.length > 0 ? ` (sources: ${sourceIds.map((s) => `t:${s}`).join(", ")})` : ""; - return `- ${content}${wasTruncated ? ` (${id})` : ""}${sourceRef}`; + return `- ${compressionHint}${content}${wasTruncated ? ` (${id})` : ""}${sourceRef}`; } case "temporal": { const m = tagged.item; @@ -608,6 +613,22 @@ export async function searchRecall( }); } + // Recency-biased list for distillation results (structural parity with + // temporal). Recent distillations covering the most recent work get a + // deserved RRF boost. Same `d:` key prefix so RRF merges, not duplicates. + if (distillationResults.length > 0) { + const recencySorted = [...distillationResults].sort( + (a, b) => b.created_at - a.created_at, + ); + allRrfLists.push({ + items: recencySorted.map((item) => ({ + source: "distillation" as const, + item, + })), + key: (r) => `d:${r.item.id}`, + }); + } + // Mark the end of the first (original) query's lists. Supplemental lists // (vector, lat.md, cross-project, quality, exact-match) are appended after // the loop and should be preserved over expanded-query lists when capping. @@ -652,7 +673,7 @@ export async function searchRecall( .map((hit): TaggedResult | null => { const row = db() .query( - "SELECT id, observations, generation, created_at, session_id, c_norm FROM distillations WHERE id = ?", + "SELECT id, observations, generation, created_at, session_id, c_norm, r_compression FROM distillations WHERE id = ?", ) .get(hit.id) as Distillation | null; if (!row) return null; @@ -844,7 +865,7 @@ export async function searchRecall( // Priority: primary (original query BM25 + recency) and supplemental // (vector, lat.md, cross-project, quality, exact-match) are high-value. // Expanded-query BM25 lists are lowest priority — trim those first. - const MAX_RRF_LISTS = 10; + const MAX_RRF_LISTS = 14; if (allRrfLists.length > MAX_RRF_LISTS) { // Layout: [0..primaryListEnd) = primary, [primaryListEnd..perQueryEnd) = expanded, [perQueryEnd..) = supplemental const primary = allRrfLists.slice(0, primaryListEnd); @@ -899,7 +920,7 @@ export function recallById(id: string): string { case "d": { const row = db() .query( - "SELECT id, observations, generation, created_at, session_id, c_norm FROM distillations WHERE id = ?", + "SELECT id, observations, generation, created_at, session_id, c_norm, r_compression FROM distillations WHERE id = ?", ) .get(rawId) as Distillation | null; if (!row) return `No entry found for id: ${id}`; @@ -966,7 +987,8 @@ export async function runRecall(input: RecallInput): Promise { /** Standard tool description reused verbatim by each host adapter. */ export const RECALL_TOOL_DESCRIPTION = - 'Search your persistent memory for this project. Two cases where you MUST use this tool: (1) Cross-session references — the user mentions past work, "last time", "before", "we discussed", "earlier", or "remember". Prior sessions are never in your context. (2) Missing details — file paths, past decisions, preferences, or approaches you don\'t see in your current window. Always prefer recall over assuming. Searches knowledge, distilled history, and message archives.' + + 'Search your persistent memory for this project. Your visible context is a trimmed window — older messages, decisions, and details may not be visible to you even within the current session. Use this tool whenever you need information that isn\'t in your current context: file paths, past decisions, user preferences, prior approaches, or anything from earlier in this conversation or previous sessions. Always prefer recall over assuming you don\'t have the information. Searches long-term knowledge, distilled history, and raw message archives.' + + '\n\nYour context contains references in the format (prefix:id) — e.g. (d:abc123) for distillations, (t:abc123) for messages. These appear in distillation headers, tool result placeholders, and truncated recall results. Pass any such ID to this tool\'s `id` parameter to retrieve the full original content. Distillations marked "lossy" have lost specific details — use the ID to drill down.' + '\n\nNever write recall status text (like "📚 Searching…" or "📚 Fetching…") yourself — these are injected by the system automatically when you use this tool.'; /** Standard parameter descriptions reused by each host adapter. */ @@ -974,5 +996,5 @@ export const RECALL_PARAM_DESCRIPTIONS = { query: "What to search for — be specific. Include keywords, file names, or concepts.", scope: "Search scope: 'all' (default) searches everything, 'session' searches current session only, 'project' searches all sessions in this project, 'knowledge' searches only long-term knowledge.", - id: "Fetch full content of a specific result by its source-prefixed ID (e.g. 'k:abc123', 'd:abc123'). IDs are shown on truncated results in recall output. When id is provided, query is ignored.", + id: "Fetch full content of a specific result by its source-prefixed ID (e.g. 'k:abc123', 'd:abc123', 't:abc123'). These IDs appear throughout your context: in distillation headers, tool result placeholders, and truncated recall results. When id is provided, query is ignored.", }; diff --git a/packages/core/test/gradient.test.ts b/packages/core/test/gradient.test.ts index ad81b03..4f8d46b 100644 --- a/packages/core/test/gradient.test.ts +++ b/packages/core/test/gradient.test.ts @@ -2488,8 +2488,9 @@ describe("selectDistillations", () => { /** Create a distillation stub with the fields selectDistillations uses. */ function dist(id: string, generation: number, createdAt: number, observations = ""): { id: string; observations: string; generation: number; token_count: number; created_at: number; session_id: string; + r_compression: number | null; c_norm: number | null; source_ids: string[]; } { - return { id, observations, generation, token_count: 100, created_at: createdAt, session_id: "sel-sess" }; + return { id, observations, generation, token_count: 100, created_at: createdAt, session_id: "sel-sess", r_compression: null, c_norm: null, source_ids: [] }; } test("returns all when count <= limit", () => { @@ -2559,3 +2560,162 @@ describe("selectDistillations", () => { expect(selected[1]!.id).toBe("g0-4"); // most recent gen-0 }); }); + +// --------------------------------------------------------------------------- +// #424: prefix/raw boundary role alternation (tool_use/tool_result mismatch) +// --------------------------------------------------------------------------- + +describe("gradient — prefix/raw boundary role alternation (#424)", () => { + const SESSION_424 = "sess-424"; + const PID_424 = "/test/gradient/project-424"; + let projectId424: string; + + // Helper: make an assistant message with a completed tool part. + function makeToolAssistant( + id: string, + toolName: string, + callID: string, + output: string, + ): LoreMessageWithParts { + const info: LoreMessage = { + id, + sessionID: SESSION_424, + role: "assistant", + time: { created: Date.now() }, + parentID: `parent-${id}`, + modelID: "claude-sonnet-4-20250514", + providerID: "anthropic", + mode: "build", + path: { cwd: "/test", root: "/test" }, + cost: 0, + tokens: { + input: 100, + output: 50, + reasoning: 0, + cache: { read: 0, write: 0 }, + }, + }; + return { + info, + parts: [ + { + id: `tool-${id}`, + sessionID: SESSION_424, + messageID: id, + type: "tool", + callID, + tool: toolName, + state: { + status: "completed", + input: { path: "test.ts" }, + output, + time: { start: Date.now(), end: Date.now() }, + }, + } as unknown as LorePart, + ], + }; + } + + beforeAll(() => { + projectId424 = ensureProject(PID_424); + }); + + beforeEach(() => { + resetCalibration(SESSION_424); + resetPrefixCache(SESSION_424); + resetRawWindowCache(SESSION_424); + resetDistillationSnapshot(SESSION_424); + setModelLimits({ context: 10_000, output: 2_000 }); + calibrate(0); + db().query("DELETE FROM distillations WHERE project_id = ?").run(projectId424); + }); + + afterAll(() => { + setModelLimits({ context: 10_000, output: 2_000 }); + calibrate(0); + db().query("DELETE FROM distillations WHERE project_id = ?").run(projectId424); + }); + + test("raw window after cutoff does not start with assistant when prefix is present", () => { + // Build a conversation that triggers gradient compression (layers 1+). + // The message array is designed so the budget cutoff naturally falls + // before an assistant message with tool parts. + // + // Structure: [u1, a1(tool), u2, a2(tool), u3, a3(tool), ..., uN(current)] + // When the budget-based cutoff evicts early messages, the raw window + // could start with an assistant. With the prefix (ending in assistant), + // this would create back-to-back assistants → tool_use without tool_result. + + // Store a distillation so gradient mode uses a prefix + db() + .query( + `INSERT INTO distillations (id, project_id, session_id, narrative, facts, observations, source_ids, generation, token_count, archived, created_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`, + ) + .run( + "dist-424", + projectId424, + SESSION_424, + "", + "[]", + "x".repeat(500), // small distillation + "[]", + 0, + 170, // ~500 chars / 3 + 0, + Date.now(), + ); + + // Build enough messages to overflow the 10K context window. + // Each tool output is ~1500 chars = ~500 tokens. With 15 tool turns + // (30 messages), we get ~7500 tokens of tool content + overhead. + const messages: LoreMessageWithParts[] = []; + for (let i = 0; i < 15; i++) { + messages.push( + makeMsg(`u${i}`, "user", `Do step ${i}`, SESSION_424), + ); + messages.push( + makeToolAssistant( + `a${i}`, + "read", + `call-${i}`, + "x".repeat(1500), + ), + ); + } + // Final user message (current turn) + messages.push( + makeMsg("u-final", "user", "Now do the final step", SESSION_424), + ); + + const result = transform({ + messages, + projectPath: PID_424, + sessionID: SESSION_424, + }); + + // The result must be at layer >= 1 (gradient mode with prefix) + expect(result.layer).toBeGreaterThanOrEqual(1); + + // Core assertion: no two consecutive messages should have the same role. + // This catches the back-to-back assistant bug that causes + // "tool_use ids found without tool_result" (#424). + for (let i = 1; i < result.messages.length; i++) { + const prev = result.messages[i - 1]; + const curr = result.messages[i]; + expect( + curr.info.role, + ).not.toBe( + prev.info.role, + ); + } + + // Additional: if the first message after the prefix is in the result, + // and a prefix exists (layer >= 1), then the 3rd message (idx 2, after + // [user_distilled, assistant_distilled]) must be user role. + if (result.messages.length > 2) { + const firstRaw = result.messages[2]; // after [user, assistant] prefix + expect(firstRaw.info.role).toBe("user"); + } + }); +}); diff --git a/packages/gateway/src/pipeline.ts b/packages/gateway/src/pipeline.ts index e5058de..4fa2ca2 100644 --- a/packages/gateway/src/pipeline.ts +++ b/packages/gateway/src/pipeline.ts @@ -2046,9 +2046,10 @@ function postResponse( // Store all messages (user + assistant) from this turn. // Convert gateway messages to Lore format. const loreMessages = gatewayMessagesToLore(req.messages, sessionID); - resolveToolResults(loreMessages); - // Store the latest user message (last user message in the array) + // Store the latest user message BEFORE resolveToolResults — we want the + // original content (including tool_result text), not the placeholder + // "[tool results provided]" that resolveToolResults creates after merging. for (let i = loreMessages.length - 1; i >= 0; i--) { if (loreMessages[i].info.role === "user") { temporal.store({ @@ -2060,6 +2061,11 @@ function postResponse( } } + // Resolve tool results for gradient transform (merges tool_result into + // assistant parts, strips from user messages — needed for reconstruct- + // after-eviction pattern but not for temporal storage above). + resolveToolResults(loreMessages); + // Build and store the assistant response message. // Strip recall marker text blocks — they contain the raw query string // and pollute FTS results with self-referential noise. @@ -3627,6 +3633,7 @@ export function removeOrphanedToolResults( content: GatewayContentBlock[]; }>, ): void { + // --- Pass 1: Remove orphaned tool_result blocks (tool_result → tool_use) --- for (let i = 0; i < messages.length; i++) { const msg = messages[i]!; if (msg.role !== "user") continue; @@ -3661,6 +3668,46 @@ export function removeOrphanedToolResults( msg.content = [{ type: "text", text: "[tool results provided]" }]; } } + + // --- Pass 2: Remove orphaned tool_use blocks (tool_use → tool_result) --- + // Every tool_use on an assistant must have a matching tool_result on the + // immediately following user message. Without this, the Anthropic API + // rejects with "tool_use ids found without tool_result blocks immediately + // after". This catches edge cases where gradient eviction or back-to-back + // assistants leave tool_use blocks without matching results (#424). + for (let i = 0; i < messages.length; i++) { + const msg = messages[i]!; + if (msg.role !== "assistant") continue; + if (!msg.content.some((b) => b.type === "tool_use")) continue; + + // Collect tool_result IDs from the following user message + const next = + i + 1 < messages.length && messages[i + 1]!.role === "user" + ? messages[i + 1]! + : null; + const toolResultIds = new Set( + (next?.content ?? []) + .filter((b): b is GatewayToolResultBlock => b.type === "tool_result") + .map((b) => b.toolUseId), + ); + + // Remove tool_use blocks that have no matching tool_result + const before = msg.content.length; + msg.content = msg.content.filter( + (b) => + b.type !== "tool_use" || + toolResultIds.has((b as GatewayToolUseBlock).id), + ); + if (msg.content.length < before) { + log.warn( + `removed ${before - msg.content.length} orphaned tool_use block(s) from assistant message ${i}`, + ); + } + // If the assistant message is now empty, add placeholder text. + if (msg.content.length === 0) { + msg.content = [{ type: "text", text: "[assistant response]" }]; + } + } } // --------------------------------------------------------------------------- diff --git a/packages/gateway/src/temporal-adapter.ts b/packages/gateway/src/temporal-adapter.ts index a462c28..b4a7c59 100644 --- a/packages/gateway/src/temporal-adapter.ts +++ b/packages/gateway/src/temporal-adapter.ts @@ -293,7 +293,10 @@ export function resolveToolResults(messages: LoreMessageWithParts[]): void { (p) => !(isToolPart(p) && p.tool === "result"), ); // If stripping left the user message with no content parts, - // add a placeholder text part so the message survives API conversion. + // add a recall-able placeholder so the model can fetch the original + // tool output via recall using the temporal message ID (t:xxx). + // The original content is stored in temporal BEFORE resolveToolResults + // runs, so `t:` retrieves the full tool_result text. if (msg.parts.length === 0 && before > 0) { msg.parts = [ { @@ -301,7 +304,7 @@ export function resolveToolResults(messages: LoreMessageWithParts[]): void { sessionID: "", messageID: msg.info.id, type: "text" as const, - text: "[tool results provided]", + text: `[tool results provided] (t:${msg.info.id})`, time: { start: 0, end: 0 }, } satisfies LoreTextPart, ]; diff --git a/packages/gateway/test/pipeline-tools.test.ts b/packages/gateway/test/pipeline-tools.test.ts index ac4b1be..dab9f85 100644 --- a/packages/gateway/test/pipeline-tools.test.ts +++ b/packages/gateway/test/pipeline-tools.test.ts @@ -6,11 +6,22 @@ * blocks" Anthropic API error that occurs when gradient evicts an assistant * message but keeps the following user message with orphaned tool_result refs. */ -import { describe, test, expect } from "bun:test"; +import { describe, test, expect, beforeAll, afterAll } from "bun:test"; import { loreMessagesToGateway, removeOrphanedToolResults, } from "../src/pipeline"; +import { + gatewayMessagesToLore, + resolveToolResults, +} from "../src/temporal-adapter"; +import { + transform, + setModelLimits, + calibrate, + db, + ensureProject, +} from "@loreai/core"; import type { LoreMessageWithParts, LoreUserMessage, @@ -19,7 +30,7 @@ import type { LoreTextPart, LoreToolPart, } from "@loreai/core"; -import type { GatewayContentBlock } from "../src/translate/types"; +import type { GatewayContentBlock, GatewayMessage } from "../src/translate/types"; // --------------------------------------------------------------------------- // Fixture helpers @@ -654,3 +665,361 @@ test("BUG-006: multiple tool_results in one message are all preserved", () => { expect(toolMsgs[2].tool_call_id).toBe("call_3"); expect(toolMsgs[2].content).toBe(""); // empty content preserved }); + +// --------------------------------------------------------------------------- +// #424: removeOrphanedToolResults — bidirectional validation (tool_use → tool_result) +// --------------------------------------------------------------------------- + +describe("removeOrphanedToolResults — tool_use→tool_result (pass 2, #424)", () => { + test("removes orphaned tool_use when no following user message exists", () => { + const messages: Array<{ + role: "user" | "assistant"; + content: GatewayContentBlock[]; + }> = [ + { + role: "user", + content: [{ type: "text", text: "hello" }], + }, + { + role: "assistant", + content: [ + { type: "text", text: "I will read the file" }, + { type: "tool_use", id: "toolu_001", name: "read", input: {} }, + ], + }, + ]; + + removeOrphanedToolResults(messages); + + // tool_use should be removed — no following user with tool_result + const assistant = messages[1]; + expect(assistant.content).toHaveLength(1); + expect(assistant.content[0].type).toBe("text"); + }); + + test("removes orphaned tool_use when following message is assistant (back-to-back)", () => { + // This simulates the #424 bug: prefix ends with assistant, raw window + // starts with assistant — loreMessagesToGateway produces back-to-back + // assistants where the first has tool_use with no tool_result. + const messages: Array<{ + role: "user" | "assistant"; + content: GatewayContentBlock[]; + }> = [ + { + role: "user", + content: [{ type: "text", text: "[memory context]" }], + }, + { + role: "assistant", + content: [ + { type: "text", text: "distilled observations" }, + ], + }, + { + role: "assistant", + content: [ + { type: "text", text: "Let me read the file" }, + { type: "tool_use", id: "toolu_eval_000010", name: "read", input: { path: "src/main.ts" } }, + ], + }, + { + role: "user", + content: [ + { type: "tool_result", toolUseId: "toolu_eval_000010", content: "file contents" }, + ], + }, + ]; + + removeOrphanedToolResults(messages); + + // The tool_use on assistant[2] references tool_result on user[3], but + // messages[3].role is user — this should be kept since the following + // message IS a user with the matching tool_result. + const assistant2 = messages[2]; + expect(assistant2.content.some((b) => b.type === "tool_use")).toBe(true); + }); + + test("removes tool_use when following user has no matching tool_result", () => { + const messages: Array<{ + role: "user" | "assistant"; + content: GatewayContentBlock[]; + }> = [ + { + role: "user", + content: [{ type: "text", text: "hello" }], + }, + { + role: "assistant", + content: [ + { type: "text", text: "Let me read" }, + { type: "tool_use", id: "toolu_001", name: "read", input: {} }, + { type: "tool_use", id: "toolu_002", name: "write", input: {} }, + ], + }, + { + role: "user", + content: [ + // Only has tool_result for toolu_001, not toolu_002 + { type: "tool_result", toolUseId: "toolu_001", content: "result" }, + { type: "text", text: "continue" }, + ], + }, + ]; + + removeOrphanedToolResults(messages); + + const assistant = messages[1]; + // toolu_001 kept (has matching result), toolu_002 removed + expect(assistant.content).toHaveLength(2); // text + toolu_001 + const toolUseBlocks = assistant.content.filter((b) => b.type === "tool_use"); + expect(toolUseBlocks).toHaveLength(1); + expect((toolUseBlocks[0] as any).id).toBe("toolu_001"); + }); + + test("keeps tool_use when following user has matching tool_result", () => { + const messages: Array<{ + role: "user" | "assistant"; + content: GatewayContentBlock[]; + }> = [ + { + role: "user", + content: [{ type: "text", text: "hello" }], + }, + { + role: "assistant", + content: [ + { type: "tool_use", id: "toolu_001", name: "read", input: {} }, + ], + }, + { + role: "user", + content: [ + { type: "tool_result", toolUseId: "toolu_001", content: "data" }, + ], + }, + ]; + + removeOrphanedToolResults(messages); + + // Everything should be preserved + expect(messages[1].content).toHaveLength(1); + expect(messages[1].content[0].type).toBe("tool_use"); + expect(messages[2].content).toHaveLength(1); + expect(messages[2].content[0].type).toBe("tool_result"); + }); + + test("replaces empty assistant with placeholder after removing all tool_use blocks", () => { + const messages: Array<{ + role: "user" | "assistant"; + content: GatewayContentBlock[]; + }> = [ + { + role: "user", + content: [{ type: "text", text: "hello" }], + }, + { + role: "assistant", + content: [ + { type: "tool_use", id: "toolu_001", name: "read", input: {} }, + ], + }, + // No following user message at all + ]; + + removeOrphanedToolResults(messages); + + // The assistant message should have a placeholder instead of being empty + expect(messages[1].content).toHaveLength(1); + expect(messages[1].content[0].type).toBe("text"); + expect((messages[1].content[0] as any).text).toBe("[assistant response]"); + }); +}); + +// --------------------------------------------------------------------------- +// #424: End-to-end integration test — inflated eval through full pipeline +// --------------------------------------------------------------------------- + +describe("end-to-end: inflated eval tool_use/tool_result through full pipeline (#424)", () => { + // Use a unique session ID to avoid state pollution from other gradient tests + const SESSION_E2E = `sess-e2e-424-${Date.now()}`; + const PID_E2E = "/test/pipeline/e2e-424"; + let projectId: string; + + beforeAll(() => { + projectId = ensureProject(PID_E2E); + // Small context window to force gradient compression (like 400K inflate) + setModelLimits({ context: 10_000, output: 2_000 }); + calibrate(0); + }); + + afterAll(() => { + db().query("DELETE FROM distillations WHERE project_id = ?").run(projectId); + }); + + /** + * Simulates the eval's buildMessages() → gateway pipeline path. + * Builds Anthropic-format messages with tool_use/tool_result pairs + * (like inflated filler), converts through the full pipeline, and + * validates Anthropic API compliance. + */ + test("inflated messages with tool_use/tool_result survive gradient compression", () => { + // Store a distillation so gradient produces a prefix (triggers layer 1+) + db() + .query( + `INSERT INTO distillations (id, project_id, session_id, narrative, facts, observations, source_ids, generation, token_count, archived, created_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`, + ) + .run( + "dist-e2e-424", + projectId, + SESSION_E2E, + "", + "[]", + "x".repeat(500), + "[]", + 0, + 170, + 0, + Date.now(), + ); + + // Build Anthropic-format messages mimicking inflated eval: + // Alternating user text + assistant tool_use + user tool_result + assistant text + // (same pattern as inflate.ts filler templates). + // Each turn generates ~1000 tokens (3000 chars / 3) to overflow the 8K usable budget. + const gatewayMessages: GatewayMessage[] = []; + let toolCounter = 0; + + for (let i = 0; i < 15; i++) { + const toolId = `toolu_eval_${(++toolCounter).toString(36).padStart(6, "0")}`; + + // User asks to do something (~300 tokens) + gatewayMessages.push({ + role: "user", + content: [{ type: "text", text: `Implement feature ${i}: ${"x".repeat(900)}` }], + }); + + // Assistant calls a tool (~700 tokens: text + tool input) + gatewayMessages.push({ + role: "assistant", + content: [ + { type: "text", text: `I'll implement feature ${i}. ${"z".repeat(600)}` }, + { type: "tool_use", id: toolId, name: "write", input: { path: `src/feature${i}.ts`, content: "x".repeat(1200) } }, + ], + }); + + // User provides tool result (~300 tokens) + gatewayMessages.push({ + role: "user", + content: [ + { type: "tool_result", toolUseId: toolId, content: `Wrote src/feature${i}.ts successfully. ${"w".repeat(800)}` }, + ], + }); + + // Assistant summarizes (~300 tokens) + gatewayMessages.push({ + role: "assistant", + content: [{ type: "text", text: `Feature ${i} implemented successfully. ${"y".repeat(900)}` }], + }); + } + + // Final user message (current turn) + gatewayMessages.push({ + role: "user", + content: [{ type: "text", text: "Now summarize everything we did." }], + }); + + // --- Full pipeline path (same as pipeline.ts step 7) --- + // 1. Convert to Lore format + const loreMessages = gatewayMessagesToLore(gatewayMessages, SESSION_E2E); + + // 2. Resolve tool results (merges into assistant parts, strips from user) + resolveToolResults(loreMessages); + + // 3. Gradient transform (will compress at layer 1+ due to small context) + const result = transform({ + messages: loreMessages, + projectPath: PID_E2E, + sessionID: SESSION_E2E, + }); + + // Must be at gradient layer 1+ (compressed with prefix) + expect(result.layer).toBeGreaterThanOrEqual(1); + // Messages were evicted (not all 61 input messages survive) + expect(result.messages.length).toBeLessThan(loreMessages.length); + + // 4. Convert back to gateway format + const transformedMessages = loreMessagesToGateway(result.messages); + + // 5. Safety net + removeOrphanedToolResults(transformedMessages); + + // --- Anthropic API compliance validation --- + + // A. No back-to-back same-role messages + for (let i = 1; i < transformedMessages.length; i++) { + expect(transformedMessages[i].role).not.toBe(transformedMessages[i - 1].role); + } + + // B. First message must be user + expect(transformedMessages[0].role).toBe("user"); + + // C. Every tool_use on an assistant has a matching tool_result on the + // immediately following user message + for (let i = 0; i < transformedMessages.length; i++) { + const msg = transformedMessages[i]; + if (msg.role !== "assistant") continue; + const toolUseIds = msg.content + .filter((b) => b.type === "tool_use") + .map((b) => (b as { id: string }).id); + + if (toolUseIds.length === 0) continue; + + // Must have a following user message + const next = transformedMessages[i + 1]; + expect(next).toBeDefined(); + expect(next.role).toBe("user"); + + // Every tool_use ID must have a matching tool_result + const toolResultIds = new Set( + next.content + .filter((b) => b.type === "tool_result") + .map((b) => (b as { toolUseId: string }).toolUseId), + ); + + for (const id of toolUseIds) { + expect(toolResultIds.has(id)).toBe(true); + } + } + + // D. Every tool_result on a user references a tool_use on the preceding assistant + for (let i = 0; i < transformedMessages.length; i++) { + const msg = transformedMessages[i]; + if (msg.role !== "user") continue; + const toolResultIds = msg.content + .filter((b) => b.type === "tool_result") + .map((b) => (b as { toolUseId: string }).toolUseId); + + if (toolResultIds.length === 0) continue; + + const prev = transformedMessages[i - 1]; + expect(prev).toBeDefined(); + expect(prev.role).toBe("assistant"); + + const toolUseIdSet = new Set( + prev.content + .filter((b) => b.type === "tool_use") + .map((b) => (b as { id: string }).id), + ); + + for (const id of toolResultIds) { + expect(toolUseIdSet.has(id)).toBe(true); + } + } + + // E. No empty content arrays + for (const msg of transformedMessages) { + expect(msg.content.length).toBeGreaterThan(0); + } + }); +}); diff --git a/packages/gateway/test/temporal-adapter.test.ts b/packages/gateway/test/temporal-adapter.test.ts index 651851d..f6ffa22 100644 --- a/packages/gateway/test/temporal-adapter.test.ts +++ b/packages/gateway/test/temporal-adapter.test.ts @@ -71,17 +71,18 @@ describe("resolveToolResults", () => { expect(resultParts).toHaveLength(0); }); - test("user message with only tool_result parts gets placeholder text after stripping", () => { + test("user message with only tool_result parts gets placeholder text with recall ID after stripping", () => { const messages = gatewayMessagesToLore(makeToolConversation(), "sess-2"); resolveToolResults(messages); // The user message that was tool_result-only should now have a placeholder + // with a recall-able reference to the original message: (t:) const toolResultUser = messages[2]!; expect(toolResultUser.parts).toHaveLength(1); expect(toolResultUser.parts[0]!.type).toBe("text"); - expect((toolResultUser.parts[0] as any).text).toBe( - "[tool results provided]", - ); + const text = (toolResultUser.parts[0] as any).text as string; + expect(text).toStartWith("[tool results provided] (t:"); + expect(text).toEndWith(")"); }); test("user message with text + tool_result preserves text, strips tool_result", () => { @@ -146,13 +147,13 @@ describe("resolveToolResults", () => { const messages = gatewayMessagesToLore(gwMessages, "sess-4"); resolveToolResults(messages); - // Orphaned tool_result should be stripped, replaced with placeholder + // Orphaned tool_result should be stripped, replaced with placeholder + recall ID const userMsg = messages[1]!; expect(userMsg.parts).toHaveLength(1); expect(userMsg.parts[0]!.type).toBe("text"); - expect((userMsg.parts[0] as any).text).toBe( - "[tool results provided]", - ); + const text = (userMsg.parts[0] as any).text as string; + expect(text).toStartWith("[tool results provided] (t:"); + expect(text).toEndWith(")"); }); test("multiple tool calls in one assistant message: all tool_result parts stripped", () => {