Add bencher.ci module for CI regression gating#941
Draft
blooop wants to merge 1 commit into
Draft
Conversation
Generic CI integration utilities for benchmark regression workflows: - write_performance_summary(): serialize RegressionReport to a pipe-delimited text file parseable by CI workflows (GitHub Actions, Jenkins, etc.), with optional per-metric filtering and thresholds - render_regression_plots(): render diagnostic PNGs for regressed metrics, suitable for embedding in PR comments - warn_on_regressions(): convenience wrapper that writes the summary, renders plots, and emits pytest warnings in one call - parse_performance_summary(): parse the summary file back into structured dicts - generate_regression_comment(): produce a GitHub-flavored Markdown PR comment with status icons, regression table, plot links, and actionable next-steps — replaces fragile shell-script comment generation in CI workflows All functions are exported from the top-level bencher package. No new dependencies required.
Contributor
Reviewer's GuideIntroduces a new bencher.ci module that centralizes CI-facing regression utilities (summary serialization/parsing, plot rendering, warnings, and Markdown comment generation) and adds comprehensive tests covering the new functionality. Sequence diagram for warn_on_regressions CI gating flowsequenceDiagram
actor PytestTest
participant warn_on_regressions
participant write_performance_summary
participant render_regression_plots
participant RegressionReport as report
participant warnings
PytestTest->>warn_on_regressions: warn_on_regressions(report, summary_path, plot_dir, metrics_filter, bench_name)
alt summary_path is not None
warn_on_regressions->>write_performance_summary: write_performance_summary(report, summary_path, metrics_filter, bench_name)
write_performance_summary-->>warn_on_regressions: lines
end
alt plot_dir is not None
warn_on_regressions->>render_regression_plots: render_regression_plots(report, plot_dir, metrics_filter, bench_name)
render_regression_plots-->>warn_on_regressions: rendered
end
warn_on_regressions->>RegressionReport: has_regressions
alt report.has_regressions
warn_on_regressions->>RegressionReport: summary()
RegressionReport-->>warn_on_regressions: summary
warn_on_regressions->>warnings: warn("Benchmark regression detected:\n" + summary)
end
warn_on_regressions-->>PytestTest: return
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Performance Report for
|
| Metric | Value |
|---|---|
| Total tests | 1423 |
| Total time | 111.34s |
| Mean | 0.0782s |
| Median | 0.0010s |
Top 10 slowest tests
| Test | Time (s) |
|---|---|
test.test_bench_examples.TestBenchExamples::test_example_meta |
18.045 |
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab |
5.238 |
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] |
4.332 |
test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py] |
3.151 |
test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py] |
2.710 |
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_noise.py] |
2.621 |
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_drift.py] |
2.611 |
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max |
2.448 |
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_step.py] |
2.257 |
test.test_time_event_curve.TestTimeEventCurvePlot::test_curve_with_string_time_src_and_cat |
1.090 |
Updated by Performance Tracking workflow
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extracts generic CI integration utilities into a new
bencher.cimodule. These functions bridge the gap between bencher's regression detection engine and CI workflow plumbing (GitHub Actions, Jenkins, etc.):write_performance_summary()— serialize aRegressionReportto a pipe-delimited text file that CI can parse, with optional per-metric filtering and custom thresholdsrender_regression_plots()— render diagnostic PNGs for regressed metrics (for embedding in PR comments)warn_on_regressions()— one-call convenience wrapper: write summary + render plots + emit pytest warningsparse_performance_summary()— parse the summary file back into structured dictsgenerate_regression_comment()— produce a GitHub-flavored Markdown PR comment with status icons, table, plot links, and next-steps guidanceMotivation
Projects using bencher for benchmark regression detection end up reimplementing the same CI glue: writing structured summaries, generating PR comments with regression tables, rendering plot PNGs. This was ~150 lines of fragile bash in CI workflows plus ~100 lines of Python in downstream projects. Moving it upstream means:
method_cells()stays the single source of truth for how each method is displayedDesign choices
metrics_filteris optional — without it, every result in the report is written. With it, only matching metrics appear and per-metric thresholds override the report-level onesbench_nameprefix — allows qualified names like"bench_planning/planning_time"so identically-named metrics from different benchmarks are distinguishableappend=Truedefault — multiple benchmark tests write to the same summary file during a CI runbash+awk+sedin CI YAML, testable and maintainableExample usage
Test plan
test/test_ci.py)pixi run ci)🤖 Generated with Claude Code
Summary by Sourcery
Introduce a bencher.ci module providing reusable CI utilities for benchmark regression gating, along with comprehensive tests.
New Features:
Tests: