Problem
No benchmarks exist for collection throughput, API rate-limit compliance, or large-dataset import performance. The system may process thousands of GitHub commits, Slack messages, or Discord events per run, yet the CI pipeline has no regression check for performance. Degradation would surface only as increased production latency or, worse, as silent timeouts that cause partial data collection. The 90% coverage gate verifies correctness but not performance characteristics.
Acceptance Criteria
- Add a
benchmarks/ directory with at least two benchmark scenarios: (a) GitHub collector processing N mock commits; (b) a service-layer bulk insert of N records
- Use
pytest-benchmark or a timing harness that emits machine-readable results (JSON)
- Establish baseline numbers in a checked-in reference file or CI artifact
- Add a CI step (can be manual/nightly, not necessarily on every PR) that runs benchmarks and flags regressions >25%
- Document how to run benchmarks locally in
CONTRIBUTING.md
Implementation Notes
Start with the highest-volume collector (likely github_activity_tracker). Mock the API responses and measure: records-per-second through the service layer, memory high-water mark, and database write throughput. Use pytest-benchmark with --benchmark-json output. The mock fixtures from the existing test suite can be extended for this purpose. Keep benchmarks in a separate pytest marker (@pytest.mark.benchmark) so they don't slow normal CI.
Problem
No benchmarks exist for collection throughput, API rate-limit compliance, or large-dataset import performance. The system may process thousands of GitHub commits, Slack messages, or Discord events per run, yet the CI pipeline has no regression check for performance. Degradation would surface only as increased production latency or, worse, as silent timeouts that cause partial data collection. The 90% coverage gate verifies correctness but not performance characteristics.
Acceptance Criteria
benchmarks/directory with at least two benchmark scenarios: (a) GitHub collector processing N mock commits; (b) a service-layer bulk insert of N recordspytest-benchmarkor a timing harness that emits machine-readable results (JSON)CONTRIBUTING.mdImplementation Notes
Start with the highest-volume collector (likely
github_activity_tracker). Mock the API responses and measure: records-per-second through the service layer, memory high-water mark, and database write throughput. Usepytest-benchmarkwith--benchmark-jsonoutput. The mock fixtures from the existing test suite can be extended for this purpose. Keep benchmarks in a separate pytest marker (@pytest.mark.benchmark) so they don't slow normal CI.