Skip to content

feat(agent-comparison): add autoresearch optimization loop#204

Merged
notque merged 1 commit intomainfrom
feat/agent-comparison-autoresearch-loop
Mar 29, 2026
Merged

feat(agent-comparison): add autoresearch optimization loop#204
notque merged 1 commit intomainfrom
feat/agent-comparison-autoresearch-loop

Conversation

@notque
Copy link
Copy Markdown
Owner

@notque notque commented Mar 29, 2026

Summary

  • add an autoresearch optimization loop for description/routing evaluation in agent-comparison
  • add optimization docs, sample task data, and viewer support for iteration results
  • harden the loop against missing frontmatter, missing protected blocks, oversized CLI payloads, and unrelated results.json files
  • shorten the comprehensive-review description to stay under the 1024-character limit

Verification

  • python3 -m py_compile skills/agent-comparison/scripts/generate_variant.py skills/agent-comparison/scripts/optimize_loop.py skills/skill-creator/scripts/eval_compare.py scripts/tests/test_agent_comparison_optimize_loop.py scripts/tests/test_eval_compare_optimization.py
  • pytest -q scripts/tests/test_agent_comparison_optimize_loop.py scripts/tests/test_eval_compare_optimization.py
  • python3 skills/agent-comparison/scripts/optimize_loop.py --target skills/go-testing/SKILL.md --goal 'improve routing precision without losing recall' --benchmark-tasks skills/agent-comparison/references/optimization-tasks.example.json --max-iterations 2 --min-gain 0.02 --train-split 0.6 --model claude-sonnet-4-20250514 --dry-run --output-dir /tmp/agent-comparison-opt-test --report /tmp/agent-comparison-opt-test/report.html --verbose

Notes

  • the branch was reviewed in multiple fix loops before publishing
  • worktree is clean after push

@notque notque merged commit 3bdf3cd into main Mar 29, 2026
3 of 4 checks passed
@notque notque deleted the feat/agent-comparison-autoresearch-loop branch March 29, 2026 02:03
notque added a commit that referenced this pull request Mar 29, 2026
PR #204 was merged to main while this branch was being developed.
All conflicts resolved in favor of the clean rework versions (ours):
- SKILL.md: review/export approach over cherry-pick
- optimization-guide.md: snapshot review terminology
- eval_viewer.html: radio selection, setActivePage helper, optimization-only mode
- eval_compare.py: standalone is_optimization_data() validator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant