If you discover a security vulnerability, please report it through GitHub Security Advisories.
Do not open a public issue for security vulnerabilities.
The following are in scope:
- The benchmark harness CLI and orchestrator (
src/) - Agent session execution and sandboxing (
src/runner/) - Condition setup (plugin installation, MCP server config)
- Score computation and data collection
- The analysis package (
analysis/)
- Acknowledgment: Within 48 hours
- Resolution target: Within 14 days for confirmed vulnerabilities
- Disclosure: Coordinated disclosure after a fix is available
This harness executes AI agent sessions that run code. Key safeguards:
- API Keys: Never committed. Use
ANTHROPIC_API_KEYenv var orclaude auth login - Plugin Isolation:
--setting-sources ''prevents user plugins from leaking across conditions - Agent Sandboxing: Sessions run in temporary git worktrees, cleaned up after each iteration
- Cost Controls:
--budgetflag enforces dollar limits
| Version | Supported |
|---|---|
| Latest | Yes |
| Older | No |