Skip to content

AutoResearch: wire up PromptOptimizer API with real scorers and benchmark #3208

@mrveiss

Description

@mrveiss

Summary

The prompt optimizer API endpoint and singleton have placeholder/empty wiring that makes the API non-functional:

  1. Empty scorers dict — `_get_optimizer` initializes with `scorers={}`, so any scorer chain reference (e.g., `"val_bpb"`) hits "scorer not found, skipping" and produces unscored variants
  2. Placeholder benchmark — `start_optimization` endpoint uses `async def _benchmark(prompt): return prompt` which just echoes prompts back, producing meaningless optimization results
  3. No agent registration path — the `PromptOptTarget` is hardcoded for `autoresearch_hypothesis` only, with no mechanism for other agents to register

Expected Behavior

  • Default scorers (ValBpbScorer, LLMJudgeScorer, HumanReviewScorer) registered at optimizer init
  • AutoResearchAgent's hypothesis generation wired as the real benchmark function
  • API start endpoint validates target exists and uses real benchmark

Files

  • `autobot-backend/services/autoresearch/routes.py` (lines 240, 306-307)

Origin

Discovered during code review of PR #3206 (Issue #3200)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions