feat(research): inject goal anchor every N iterations to prevent drift by Meur3ault · Pull Request #260 · huggingface/ml-intern

Meur3ault · 2026-05-15T04:47:43Z

Re-anchor research agent to original task every N iterations

Problem

The research sub-agent runs up to 60 iterations. After a long chain of tool
calls the original task ends up at position 1 in the message list, buried under
dozens of assistant/tool message pairs. In practice the model starts
summarising whatever it found most recently rather than what it was asked to
find.

This is specific to the research sub-agent: the task is a single fixed string
set once at the start, there are no follow-up user messages to re-anchor it,
and the loop runs fully autonomously. The main agent doesn't have this problem
because each user turn naturally resets focus.

Fix

Every 10 iterations, append a [SYSTEM: GOAL ANCHOR] user message containing:

the original task verbatim
a snippet of any thinking text the model produced alongside the previous tool
calls (capped at 500 chars, used as a cheap "progress so far" without an
extra LLM call)

The anchor fires after doom-loop detection and before the context-budget check,
so all three nag mechanisms stay independent.

Testing

Unit tests cover the injection boundaries, periodic repeat at 2N, doom-loop
coexistence, and a structural test that verifies the anchor is the most recent
user message before the LLM call with exactly N tool-result messages between it
and the original task statement.

For the actual capability claim I ran a live A/B test (tests/integration/ test_research_anchor_eval.py) against Llama-3.1-8B via HF router: injected 15
off-topic time-series tool results to simulate drift, then compared summaries
with and without the anchor.

A (no anchor):  score = -3   off-task keywords dominated
B (with anchor): score = +4   model re-engaged with LoRA fine-tuning task

The eval asserts both that drift occurs without the anchor and that it's
corrected with it. Enable with ML_INTERN_LIVE_LLM_TESTS=1 HF_TOKEN=....

Add N-turn fact injection to the research sub-agent loop. Every 10 iterations a [SYSTEM: GOAL ANCHOR] user message is appended containing the original task verbatim, and any interim thinking text the model emitted alongside tool calls (capped at 500 chars). This re-states the task near the tail of the message list — where it has full attention weight — rather than leaving it buried at position 1 under N rounds of tool calls. New module-private helpers: _should_inject_fact(iteration) -> bool _build_fact_anchor(task, thinking_text) -> str New constants: _RESEARCH_FACT_INTERVAL = 10 _RESEARCH_FACT_SUMMARY_MAX = 500 Unit tests (16): helper boundary conditions, periodic repeat at 2N, doom-loop coexistence, and a structural test verifying the anchor is the most recent user message before the LLM call with N tool-result messages between it and the original task statement. Eval (tests/integration/test_research_anchor_eval.py): live A/B test against meta-llama/Llama-3.1-8B-Instruct via HF router. Injects 15 off-topic time-series tool results to induce drift, then compares summaries with and without the anchor. Verified: score_a=-3 (drift occurs), score_b=+4 (anchor corrects it, delta=+7). Skipped in CI; enable with ML_INTERN_LIVE_LLM_TESTS=1 HF_TOKEN=...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(research): inject goal anchor every N iterations to prevent drift#260

feat(research): inject goal anchor every N iterations to prevent drift#260
Meur3ault wants to merge 1 commit into
huggingface:mainfrom
Meur3ault:feat/research-fact-injection

Meur3ault commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Meur3ault commented May 15, 2026

Re-anchor research agent to original task every N iterations

Problem

Fix

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant