Skip to content

feat(research): inject goal anchor every N iterations to prevent drift#260

Open
Meur3ault wants to merge 1 commit into
huggingface:mainfrom
Meur3ault:feat/research-fact-injection
Open

feat(research): inject goal anchor every N iterations to prevent drift#260
Meur3ault wants to merge 1 commit into
huggingface:mainfrom
Meur3ault:feat/research-fact-injection

Conversation

@Meur3ault
Copy link
Copy Markdown

Re-anchor research agent to original task every N iterations

Problem

The research sub-agent runs up to 60 iterations. After a long chain of tool
calls the original task ends up at position 1 in the message list, buried under
dozens of assistant/tool message pairs. In practice the model starts
summarising whatever it found most recently rather than what it was asked to
find.

This is specific to the research sub-agent: the task is a single fixed string
set once at the start, there are no follow-up user messages to re-anchor it,
and the loop runs fully autonomously. The main agent doesn't have this problem
because each user turn naturally resets focus.

Fix

Every 10 iterations, append a [SYSTEM: GOAL ANCHOR] user message containing:

  • the original task verbatim
  • a snippet of any thinking text the model produced alongside the previous tool
    calls (capped at 500 chars, used as a cheap "progress so far" without an
    extra LLM call)

The anchor fires after doom-loop detection and before the context-budget check,
so all three nag mechanisms stay independent.

Testing

Unit tests cover the injection boundaries, periodic repeat at 2N, doom-loop
coexistence, and a structural test that verifies the anchor is the most recent
user message before the LLM call with exactly N tool-result messages between it
and the original task statement.

For the actual capability claim I ran a live A/B test (tests/integration/ test_research_anchor_eval.py) against Llama-3.1-8B via HF router: injected 15
off-topic time-series tool results to simulate drift, then compared summaries
with and without the anchor.

A (no anchor):  score = -3   off-task keywords dominated
B (with anchor): score = +4   model re-engaged with LoRA fine-tuning task

The eval asserts both that drift occurs without the anchor and that it's
corrected with it. Enable with ML_INTERN_LIVE_LLM_TESTS=1 HF_TOKEN=....

Add N-turn fact injection to the research sub-agent loop. Every 10
iterations a [SYSTEM: GOAL ANCHOR] user message is appended containing
the original task verbatim, and any interim thinking text the model
emitted alongside tool calls (capped at 500 chars). This re-states the
task near the tail of the message list — where it has full attention
weight — rather than leaving it buried at position 1 under N rounds of
tool calls.

New module-private helpers:
  _should_inject_fact(iteration) -> bool
  _build_fact_anchor(task, thinking_text) -> str

New constants:
  _RESEARCH_FACT_INTERVAL = 10
  _RESEARCH_FACT_SUMMARY_MAX = 500

Unit tests (16): helper boundary conditions, periodic repeat at 2N,
doom-loop coexistence, and a structural test verifying the anchor is the
most recent user message before the LLM call with N tool-result messages
between it and the original task statement.

Eval (tests/integration/test_research_anchor_eval.py): live A/B test
against meta-llama/Llama-3.1-8B-Instruct via HF router. Injects 15
off-topic time-series tool results to induce drift, then compares
summaries with and without the anchor. Verified: score_a=-3 (drift
occurs), score_b=+4 (anchor corrects it, delta=+7). Skipped in CI;
enable with ML_INTERN_LIVE_LLM_TESTS=1 HF_TOKEN=...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant