docs(agents): sync README with codebase, add echo_agent and final_answer log test by thisisvk45 · Pull Request #53 · Mercor-Intelligence/archipelago

thisisvk45 · 2026-04-27T19:47:30Z

Summary

The agents/README.md had drifted from the actual codebase. The "Available agent IDs" section listed loop_agent, toolbelt_agent, and singleshot_agent, but runner/agents/registry.py only registers loop_agent and react_toolbelt_agent. The "Creating a New Agent" snippet used input for the run parameter (actual code uses run_input) and a truncated AGENT_REGISTRY snippet that implied overwriting existing entries rather than adding to them. The README also referenced tests/test_final_answer_log.py as enforcing the agent contract, but that file did not exist in the repo.

This PR resyncs the README with code, adds a minimal echo_agent as a canonical reference implementation, creates the missing test file (tight scope, see below), and codifies the agent contract that the README claimed was enforced.

Why this matters

agents/README.md is the primary entry point for anyone adding a new agent to the registry. Doc drift here makes the contributor floor higher than it needs to be, and the existing snippet produced non-functional code on first try. The missing test file meant the contract was claimed but not actually checked.

Changes

1. `agents/README.md` resynced

Symbol names match registry.py and models.py. References to phantom agents removed. The "Creating a New Agent" snippet now uses run_input and shows registration via dict update rather than a truncated literal. Stale model string anthropic/claude-3-5-sonnet-20241022 updated to anthropic/claude-opus-4-5 to match the model used in the paper's Table 9. Added explicit "Agent contract" section listing the three guarantees every registered agent must satisfy.

2. `runner/agents/echo_agent/` added

About 60 lines. Does not call any LLM and does not connect to MCP. Reads the last user message, echoes it back as an assistant message, emits the final_answer log via logger.bind(message_type="final_answer").info(answer), returns a valid AgentTrajectoryOutput with status=COMPLETED. Intended as the simplest possible reference implementation for the contract. Also serves as the only agent that can be exercised end-to-end in tests without mocking LiteLLM.

3. `tests/test_final_answer_log.py` added (tight scope)

Three tests:

Every AgentConfigIds enum value has a corresponding AGENT_REGISTRY entry
Every registered agent_impl is an async callable
echo_agent end-to-end run emits exactly one final_answer log with the correct message_type binding

The third test is the only one that exercises an agent end-to-end. A follow-up PR can add mocked LiteLLM coverage so loop_agent and react_toolbelt_agent get the same end-to-end check. Tight scope here keeps this PR focused on closing the doc-drift gap rather than introducing a test framework.

4. `agents/CONTRIBUTING-AGENTS.md` added

One-page checklist for adding a new agent: enum entry, run signature, registry entry, final_answer log requirement, verification step. Mirrors the contract documented in the README so contributors have a single page to consult.

Testing

All three new tests in tests/test_final_answer_log.py pass
All previously passing tests still pass (no regressions)
echo_agent runs end-to-end in examples/simple_task style invocation

Out of scope

Typed final_answer field on AgentTrajectoryOutput (currently a log-channel convention, worth promoting to a return field but a larger refactor)
Mocked end-to-end test coverage for loop_agent and react_toolbelt_agent (follow-up PR; this PR keeps test scope tight)
RunManifest and replay subcommand for reproducibility (separate PR, will reference open issues Reproduction discrepancy: Kimi K2.5 Thinking scores lower than reported on Law domain #4 and Inconsistency in GLM4.7 Official Reported Mean Score #8)
The grading-side JSON fence-stripping fix is already in fix(grading): strip markdown code fences from judge JSON responses #52

Files changed

agents/README.md (rewrite)
agents/CONTRIBUTING-AGENTS.md (new)
agents/runner/agents/echo_agent/__init__.py (new, empty)
agents/runner/agents/echo_agent/main.py (new, ~60 lines)
agents/runner/agents/models.py (1 line: enum entry)
agents/runner/agents/registry.py (3 lines: import plus AgentDefn entry)
agents/tests/__init__.py (new, empty)
agents/tests/test_final_answer_log.py (new, 3 tests)

…wer log test The agents/README.md had drifted from the codebase. The 'Available agent IDs' section listed loop_agent, toolbelt_agent, and singleshot_agent, but runner/agents/registry.py only registers loop_agent and react_toolbelt_agent. The 'Creating a New Agent' snippet used the wrong parameter name (input vs run_input) and a truncated registry snippet that implied overwriting existing entries rather than adding to them. The README also referenced tests/test_final_answer_log.py as enforcing the agent contract, but that file did not exist. This PR: 1. Resyncs agents/README.md with current symbol names. Removes references to phantom agents. Fixes the 'Creating a New Agent' snippet to use run_input and the dict-update registration style. Updates the stale anthropic/claude-3-5-sonnet-20241022 model string to anthropic/claude-opus-4-5 to match paper Table 9. 2. Adds runner/agents/echo_agent as a 60-line reference implementation. It does not call any LLM or connect to MCP and is intended as the canonical hello-world for new contributors. 3. Adds tests/test_final_answer_log.py with three tight tests: every AgentConfigIds value has an AGENT_REGISTRY entry, every registered agent_impl is an async callable, and echo_agent emits exactly one final_answer log when run end-to-end. A follow-up PR can mock LiteLLM to extend end-to-end coverage to loop_agent and react_toolbelt_agent. 4. Adds CONTRIBUTING-AGENTS.md codifying the agent contract as a one-page checklist. Out of scope: typed final_answer field on AgentTrajectoryOutput, RunManifest replay harness for reproducibility issues Mercor-Intelligence#4 and Mercor-Intelligence#8. The grading-side fence-stripping fix is in Mercor-Intelligence#52.

thisisvk45 force-pushed the docs/agents-readme-sync-and-echo-agent branch from 7dfc387 to 6845c96 Compare April 27, 2026 19:56

thisisvk45 mentioned this pull request Apr 27, 2026

feat(agents): add RunManifest, --seed, --deterministic, and replay stub #54

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(agents): sync README with codebase, add echo_agent and final_answer log test#53

docs(agents): sync README with codebase, add echo_agent and final_answer log test#53
thisisvk45 wants to merge 1 commit intoMercor-Intelligence:mainfrom
thisisvk45:docs/agents-readme-sync-and-echo-agent

thisisvk45 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thisisvk45 commented Apr 27, 2026

Summary

Why this matters

Changes

1. agents/README.md resynced

2. runner/agents/echo_agent/ added

3. tests/test_final_answer_log.py added (tight scope)

4. agents/CONTRIBUTING-AGENTS.md added

Testing

Out of scope

Files changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `agents/README.md` resynced

2. `runner/agents/echo_agent/` added

3. `tests/test_final_answer_log.py` added (tight scope)

4. `agents/CONTRIBUTING-AGENTS.md` added