Summary
The eval module currently lives at src/sre_agent/eval/, which means it gets bundled into the distributed wheel alongside production code (cli, config, core). Evaluation/benchmark code is a dev-time concern and should not ship with the package.
Proposed Changes
- Move
src/sre_agent/eval/ → evals/ at the project root (alongside tests/, docs/, etc.)
- Remove the eval entry points from
[project.scripts] in pyproject.toml:
sre-agent-run-tool-call-eval
sre-agent-run-diagnosis-quality-eval
- Update invocation to run evals as standalone scripts (e.g.
python -m evals.tool_call.run) or via a Makefile/task runner target.
- Update any CI/docs that reference the old paths or entry points.
Motivation
- Smaller production package — eval code, fixtures, and eval-only dependencies don't belong in the shipped wheel.
- Separation of concerns — keeps
src/sre_agent/ focused on the agent itself.
- Convention — matches the common pattern of top-level
tests/, evals/, benchmarks/ directories.
Summary
The
evalmodule currently lives atsrc/sre_agent/eval/, which means it gets bundled into the distributed wheel alongside production code (cli,config,core). Evaluation/benchmark code is a dev-time concern and should not ship with the package.Proposed Changes
src/sre_agent/eval/→evals/at the project root (alongsidetests/,docs/, etc.)[project.scripts]inpyproject.toml:sre-agent-run-tool-call-evalsre-agent-run-diagnosis-quality-evalpython -m evals.tool_call.run) or via a Makefile/task runner target.Motivation
src/sre_agent/focused on the agent itself.tests/,evals/,benchmarks/directories.