Skip to content

docs(agents): sync README with codebase, add echo_agent and final_answer log test#53

Open
thisisvk45 wants to merge 1 commit intoMercor-Intelligence:mainfrom
thisisvk45:docs/agents-readme-sync-and-echo-agent
Open

docs(agents): sync README with codebase, add echo_agent and final_answer log test#53
thisisvk45 wants to merge 1 commit intoMercor-Intelligence:mainfrom
thisisvk45:docs/agents-readme-sync-and-echo-agent

Conversation

@thisisvk45
Copy link
Copy Markdown

Summary

The agents/README.md had drifted from the actual codebase. The "Available agent IDs" section listed loop_agent, toolbelt_agent, and singleshot_agent, but runner/agents/registry.py only registers loop_agent and react_toolbelt_agent. The "Creating a New Agent" snippet used input for the run parameter (actual code uses run_input) and a truncated AGENT_REGISTRY snippet that implied overwriting existing entries rather than adding to them. The README also referenced tests/test_final_answer_log.py as enforcing the agent contract, but that file did not exist in the repo.

This PR resyncs the README with code, adds a minimal echo_agent as a canonical reference implementation, creates the missing test file (tight scope, see below), and codifies the agent contract that the README claimed was enforced.

Why this matters

agents/README.md is the primary entry point for anyone adding a new agent to the registry. Doc drift here makes the contributor floor higher than it needs to be, and the existing snippet produced non-functional code on first try. The missing test file meant the contract was claimed but not actually checked.

Changes

1. agents/README.md resynced

Symbol names match registry.py and models.py. References to phantom agents removed. The "Creating a New Agent" snippet now uses run_input and shows registration via dict update rather than a truncated literal. Stale model string anthropic/claude-3-5-sonnet-20241022 updated to anthropic/claude-opus-4-5 to match the model used in the paper's Table 9. Added explicit "Agent contract" section listing the three guarantees every registered agent must satisfy.

2. runner/agents/echo_agent/ added

About 60 lines. Does not call any LLM and does not connect to MCP. Reads the last user message, echoes it back as an assistant message, emits the final_answer log via logger.bind(message_type="final_answer").info(answer), returns a valid AgentTrajectoryOutput with status=COMPLETED. Intended as the simplest possible reference implementation for the contract. Also serves as the only agent that can be exercised end-to-end in tests without mocking LiteLLM.

3. tests/test_final_answer_log.py added (tight scope)

Three tests:

  • Every AgentConfigIds enum value has a corresponding AGENT_REGISTRY entry
  • Every registered agent_impl is an async callable
  • echo_agent end-to-end run emits exactly one final_answer log with the correct message_type binding

The third test is the only one that exercises an agent end-to-end. A follow-up PR can add mocked LiteLLM coverage so loop_agent and react_toolbelt_agent get the same end-to-end check. Tight scope here keeps this PR focused on closing the doc-drift gap rather than introducing a test framework.

4. agents/CONTRIBUTING-AGENTS.md added

One-page checklist for adding a new agent: enum entry, run signature, registry entry, final_answer log requirement, verification step. Mirrors the contract documented in the README so contributors have a single page to consult.

Testing

  • All three new tests in tests/test_final_answer_log.py pass
  • All previously passing tests still pass (no regressions)
  • echo_agent runs end-to-end in examples/simple_task style invocation

Out of scope

Files changed

  • agents/README.md (rewrite)
  • agents/CONTRIBUTING-AGENTS.md (new)
  • agents/runner/agents/echo_agent/__init__.py (new, empty)
  • agents/runner/agents/echo_agent/main.py (new, ~60 lines)
  • agents/runner/agents/models.py (1 line: enum entry)
  • agents/runner/agents/registry.py (3 lines: import plus AgentDefn entry)
  • agents/tests/__init__.py (new, empty)
  • agents/tests/test_final_answer_log.py (new, 3 tests)

…wer log test

The agents/README.md had drifted from the codebase. The 'Available
agent IDs' section listed loop_agent, toolbelt_agent, and
singleshot_agent, but runner/agents/registry.py only registers
loop_agent and react_toolbelt_agent. The 'Creating a New Agent'
snippet used the wrong parameter name (input vs run_input) and a
truncated registry snippet that implied overwriting existing entries
rather than adding to them. The README also referenced
tests/test_final_answer_log.py as enforcing the agent contract, but
that file did not exist.

This PR:

1. Resyncs agents/README.md with current symbol names. Removes
   references to phantom agents. Fixes the 'Creating a New Agent'
   snippet to use run_input and the dict-update registration style.
   Updates the stale anthropic/claude-3-5-sonnet-20241022 model
   string to anthropic/claude-opus-4-5 to match paper Table 9.

2. Adds runner/agents/echo_agent as a 60-line reference
   implementation. It does not call any LLM or connect to MCP and
   is intended as the canonical hello-world for new contributors.

3. Adds tests/test_final_answer_log.py with three tight tests:
   every AgentConfigIds value has an AGENT_REGISTRY entry, every
   registered agent_impl is an async callable, and echo_agent
   emits exactly one final_answer log when run end-to-end. A
   follow-up PR can mock LiteLLM to extend end-to-end coverage to
   loop_agent and react_toolbelt_agent.

4. Adds CONTRIBUTING-AGENTS.md codifying the agent contract as a
   one-page checklist.

Out of scope: typed final_answer field on AgentTrajectoryOutput,
RunManifest replay harness for reproducibility issues Mercor-Intelligence#4 and Mercor-Intelligence#8.
The grading-side fence-stripping fix is in Mercor-Intelligence#52.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant