feat(research): inject goal anchor every N iterations to prevent drift#260
Open
Meur3ault wants to merge 1 commit into
Open
feat(research): inject goal anchor every N iterations to prevent drift#260Meur3ault wants to merge 1 commit into
Meur3ault wants to merge 1 commit into
Conversation
Add N-turn fact injection to the research sub-agent loop. Every 10 iterations a [SYSTEM: GOAL ANCHOR] user message is appended containing the original task verbatim, and any interim thinking text the model emitted alongside tool calls (capped at 500 chars). This re-states the task near the tail of the message list — where it has full attention weight — rather than leaving it buried at position 1 under N rounds of tool calls. New module-private helpers: _should_inject_fact(iteration) -> bool _build_fact_anchor(task, thinking_text) -> str New constants: _RESEARCH_FACT_INTERVAL = 10 _RESEARCH_FACT_SUMMARY_MAX = 500 Unit tests (16): helper boundary conditions, periodic repeat at 2N, doom-loop coexistence, and a structural test verifying the anchor is the most recent user message before the LLM call with N tool-result messages between it and the original task statement. Eval (tests/integration/test_research_anchor_eval.py): live A/B test against meta-llama/Llama-3.1-8B-Instruct via HF router. Injects 15 off-topic time-series tool results to induce drift, then compares summaries with and without the anchor. Verified: score_a=-3 (drift occurs), score_b=+4 (anchor corrects it, delta=+7). Skipped in CI; enable with ML_INTERN_LIVE_LLM_TESTS=1 HF_TOKEN=...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-anchor research agent to original task every N iterations
Problem
The research sub-agent runs up to 60 iterations. After a long chain of tool
calls the original task ends up at position 1 in the message list, buried under
dozens of assistant/tool message pairs. In practice the model starts
summarising whatever it found most recently rather than what it was asked to
find.
This is specific to the research sub-agent: the task is a single fixed string
set once at the start, there are no follow-up user messages to re-anchor it,
and the loop runs fully autonomously. The main agent doesn't have this problem
because each user turn naturally resets focus.
Fix
Every 10 iterations, append a
[SYSTEM: GOAL ANCHOR]user message containing:calls (capped at 500 chars, used as a cheap "progress so far" without an
extra LLM call)
The anchor fires after doom-loop detection and before the context-budget check,
so all three nag mechanisms stay independent.
Testing
Unit tests cover the injection boundaries, periodic repeat at 2N, doom-loop
coexistence, and a structural test that verifies the anchor is the most recent
user message before the LLM call with exactly N tool-result messages between it
and the original task statement.
For the actual capability claim I ran a live A/B test (
tests/integration/ test_research_anchor_eval.py) against Llama-3.1-8B via HF router: injected 15off-topic time-series tool results to simulate drift, then compared summaries
with and without the anchor.
The eval asserts both that drift occurs without the anchor and that it's
corrected with it. Enable with
ML_INTERN_LIVE_LLM_TESTS=1 HF_TOKEN=....