Skip to content

fix: prevent tool_use/tool_result mismatch at gradient prefix/raw boundary (#424)#428

Open
BYK wants to merge 1 commit into
mainfrom
fix-tool-pairing-424
Open

fix: prevent tool_use/tool_result mismatch at gradient prefix/raw boundary (#424)#428
BYK wants to merge 1 commit into
mainfrom
fix-tool-pairing-424

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 20, 2026

Summary

Fixes the Anthropic API 400 errors (tool_use ids were found without tool_result blocks immediately after) that occur when running the inflated eval (--inflate 400000) through the gateway. The root cause was back-to-back assistant messages produced when the gradient context manager's distilled prefix (ending with assistant) was concatenated with a raw window that started with an assistant message after budget-driven cutoff.

Closes #424

Root Cause

tryFit() assembles output as [...distilledPrefix, ...rawWindow]. The prefix always ends with an assistant message (text-only). The raw window cutoff is purely token-budget-driven and can land on any role. When it lands such that the raw window starts with an assistant containing tool_use blocks, the output has consecutive assistant messages. The Anthropic API requires every tool_use to have a matching tool_result on the immediately following user message — with back-to-back assistants, this invariant is violated.

Changes

Primary fix: role alternation at prefix/raw boundary (gradient.ts)

  • tryFit(): After computing budget-driven cutoff, advances past leading assistant messages when prefix is present
  • tryFitStable(): Adjusts pinned index forward past leading assistants before slicing the pinned window
  • Emergency layer 4: Drops leading assistant messages from the older tail before merging with current turn

Safety net: bidirectional tool validation (pipeline.ts)

  • removeOrphanedToolResults() gains a Pass 2 that validates tool_use→tool_result (every tool_use on an assistant has a matching tool_result on the next user). Previously only validated tool_result→tool_use direction.

Tests

  • gradient.test.ts: New test suite building a 30-message conversation that overflows context, triggers layer 4 with a distilled prefix, and asserts no consecutive same-role messages
  • pipeline-tools.test.ts: 5 new Pass 2 tests covering orphaned tool_use scenarios (no following user, back-to-back assistants, partial matching)
  • pipeline-tools.test.ts: End-to-end integration test simulating the eval's full pipeline path (gateway format → Lore format → resolveToolResults → gradient transform → loreMessagesToGateway → removeOrphanedToolResults) with 40+ Anthropic API compliance assertions

Verification

  • bun test packages/core/test/gradient.test.ts — 104 pass
  • bun test packages/gateway/test/pipeline-tools.test.ts — 23 pass
  • bun run typecheck — all 4 packages pass

@BYK BYK self-assigned this May 20, 2026
@BYK BYK force-pushed the fix-tool-pairing-424 branch 3 times, most recently from 462f650 to b21b5ba Compare May 20, 2026 16:02
…ism (#424)

Core recall improvements:
- Raise temporal SOURCE_WEIGHT 0.5→0.8 (parity with distillation for display budget)
- Raise charBudget 8K→12K chars for recall results (more room for specific details)
- Raise MAX_RRF_LISTS 10→14 (accommodate distillation recency list)
- Lower vectorBoostMinTerms 3→2 (activate vector boost for 2-term queries)
- Add distillation recency RRF list (structural parity with temporal)
- Show [lossy] hint on distillation recall results when r_compression < 1.0

Distillation transparency:
- formatDistillations() now renders compression signal + IDs + source count
  so the model can see how lossy each distillation is and drill into details
- gradient.ts loads r_compression, c_norm, source_ids from DB for distillations
- Recall tool description updated with drill-down guidance for d:xxx/t:xxx IDs

Eval realism:
- compactionBaseline() now iterates (2-4 passes) matching real Claude Code
  auto-compact behavior at 83.5% of context window threshold
- QA_SYSTEM prompt made baseline-agnostic: no recall-specific coaching
- buildQAPrompt() preamble neutralized across all baselines

Closes #424
@BYK BYK force-pushed the fix-tool-pairing-424 branch from b21b5ba to 1acbe94 Compare May 20, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: inflated eval produces tool_use/tool_result mismatch errors through gateway

1 participant