fix: prevent tool_use/tool_result mismatch at gradient prefix/raw boundary (#424)#428
Open
BYK wants to merge 1 commit into
Open
fix: prevent tool_use/tool_result mismatch at gradient prefix/raw boundary (#424)#428BYK wants to merge 1 commit into
BYK wants to merge 1 commit into
Conversation
462f650 to
b21b5ba
Compare
…ism (#424) Core recall improvements: - Raise temporal SOURCE_WEIGHT 0.5→0.8 (parity with distillation for display budget) - Raise charBudget 8K→12K chars for recall results (more room for specific details) - Raise MAX_RRF_LISTS 10→14 (accommodate distillation recency list) - Lower vectorBoostMinTerms 3→2 (activate vector boost for 2-term queries) - Add distillation recency RRF list (structural parity with temporal) - Show [lossy] hint on distillation recall results when r_compression < 1.0 Distillation transparency: - formatDistillations() now renders compression signal + IDs + source count so the model can see how lossy each distillation is and drill into details - gradient.ts loads r_compression, c_norm, source_ids from DB for distillations - Recall tool description updated with drill-down guidance for d:xxx/t:xxx IDs Eval realism: - compactionBaseline() now iterates (2-4 passes) matching real Claude Code auto-compact behavior at 83.5% of context window threshold - QA_SYSTEM prompt made baseline-agnostic: no recall-specific coaching - buildQAPrompt() preamble neutralized across all baselines Closes #424
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the Anthropic API 400 errors (
tool_use ids were found without tool_result blocks immediately after) that occur when running the inflated eval (--inflate 400000) through the gateway. The root cause was back-to-back assistant messages produced when the gradient context manager's distilled prefix (ending with assistant) was concatenated with a raw window that started with an assistant message after budget-driven cutoff.Closes #424
Root Cause
tryFit()assembles output as[...distilledPrefix, ...rawWindow]. The prefix always ends with an assistant message (text-only). The raw window cutoff is purely token-budget-driven and can land on any role. When it lands such that the raw window starts with an assistant containingtool_useblocks, the output has consecutive assistant messages. The Anthropic API requires everytool_useto have a matchingtool_resulton the immediately following user message — with back-to-back assistants, this invariant is violated.Changes
Primary fix: role alternation at prefix/raw boundary (
gradient.ts)tryFit(): After computing budget-driven cutoff, advances past leading assistant messages when prefix is presenttryFitStable(): Adjusts pinned index forward past leading assistants before slicing the pinned windowSafety net: bidirectional tool validation (
pipeline.ts)removeOrphanedToolResults()gains a Pass 2 that validates tool_use→tool_result (everytool_useon an assistant has a matchingtool_resulton the next user). Previously only validated tool_result→tool_use direction.Tests
gradient.test.ts: New test suite building a 30-message conversation that overflows context, triggers layer 4 with a distilled prefix, and asserts no consecutive same-role messagespipeline-tools.test.ts: 5 new Pass 2 tests covering orphanedtool_usescenarios (no following user, back-to-back assistants, partial matching)pipeline-tools.test.ts: End-to-end integration test simulating the eval's full pipeline path (gateway format → Lore format → resolveToolResults → gradient transform → loreMessagesToGateway → removeOrphanedToolResults) with 40+ Anthropic API compliance assertionsVerification
bun test packages/core/test/gradient.test.ts— 104 passbun test packages/gateway/test/pipeline-tools.test.ts— 23 passbun run typecheck— all 4 packages pass