-
-
Notifications
You must be signed in to change notification settings - Fork 1
AutoResearch: wire up PromptOptimizer API with real scorers and benchmark #3208
Copy link
Copy link
Open
Labels
Description
Summary
The prompt optimizer API endpoint and singleton have placeholder/empty wiring that makes the API non-functional:
- Empty scorers dict — `_get_optimizer` initializes with `scorers={}`, so any scorer chain reference (e.g., `"val_bpb"`) hits "scorer not found, skipping" and produces unscored variants
- Placeholder benchmark — `start_optimization` endpoint uses `async def _benchmark(prompt): return prompt` which just echoes prompts back, producing meaningless optimization results
- No agent registration path — the `PromptOptTarget` is hardcoded for `autoresearch_hypothesis` only, with no mechanism for other agents to register
Expected Behavior
- Default scorers (ValBpbScorer, LLMJudgeScorer, HumanReviewScorer) registered at optimizer init
- AutoResearchAgent's hypothesis generation wired as the real benchmark function
- API start endpoint validates target exists and uses real benchmark
Files
- `autobot-backend/services/autoresearch/routes.py` (lines 240, 306-307)
Origin
Reactions are currently unavailable