-
-
Notifications
You must be signed in to change notification settings - Fork 1
AutoResearch M3: missing test coverage for edge cases and new endpoints #3211
Copy link
Copy link
Open
Description
Summary
Code review of M3 PRs identified several test coverage gaps:
Backend
- No test for multi-scorer chains (final_score averaging across >1 scorer)
- No route-level tests for the 10 new API endpoints (only unit/integration tests)
- No test for `LLMJudgeScorer._parse_rating` with completely unparseable input
- No test for `ValBpbScorer` when `ExperimentRunner.run_experiment` raises exception
- No test for `KnowledgeSynthesizer.synthesize_session` with LLM failure or empty session
- No test for enriched `_build_document` when `val_bpb=None` but `baseline_val_bpb` is set
Frontend
- Accessibility: buttons lack `aria-label`, status dots use color alone, search input has no ``
- No test for `ExperimentDashboard` component mount/rendering
Files
- `autobot-backend/services/autoresearch/scorers_test.py`
- `autobot-backend/services/autoresearch/prompt_optimizer_test.py`
- `autobot-backend/services/autoresearch/knowledge_synthesizer_test.py`
- `autobot-backend/services/autoresearch/routes_test.py`
- `autobot-frontend/src/components/autoresearch/`
Origin
Discovered during code review of PRs #3202, #3203, #3206, #3207
Reactions are currently unavailable