Skip to content

Add bencher.ci module for CI regression gating#941

Draft
blooop wants to merge 1 commit into
mainfrom
feat/ci-utils-and-report-index
Draft

Add bencher.ci module for CI regression gating#941
blooop wants to merge 1 commit into
mainfrom
feat/ci-utils-and-report-index

Conversation

@blooop
Copy link
Copy Markdown
Owner

@blooop blooop commented May 18, 2026

Summary

Extracts generic CI integration utilities into a new bencher.ci module. These functions bridge the gap between bencher's regression detection engine and CI workflow plumbing (GitHub Actions, Jenkins, etc.):

  • write_performance_summary() — serialize a RegressionReport to a pipe-delimited text file that CI can parse, with optional per-metric filtering and custom thresholds
  • render_regression_plots() — render diagnostic PNGs for regressed metrics (for embedding in PR comments)
  • warn_on_regressions() — one-call convenience wrapper: write summary + render plots + emit pytest warnings
  • parse_performance_summary() — parse the summary file back into structured dicts
  • generate_regression_comment() — produce a GitHub-flavored Markdown PR comment with status icons, table, plot links, and next-steps guidance

Motivation

Projects using bencher for benchmark regression detection end up reimplementing the same CI glue: writing structured summaries, generating PR comments with regression tables, rendering plot PNGs. This was ~150 lines of fragile bash in CI workflows plus ~100 lines of Python in downstream projects. Moving it upstream means:

  1. One tested implementation instead of per-project copies
  2. New detection methods (adaptive, delta, absolute) automatically render correctly in comments
  3. method_cells() stays the single source of truth for how each method is displayed

Design choices

  • No new dependencies — uses only stdlib + existing bencher imports
  • metrics_filter is optional — without it, every result in the report is written. With it, only matching metrics appear and per-metric thresholds override the report-level ones
  • bench_name prefix — allows qualified names like "bench_planning/planning_time" so identically-named metrics from different benchmarks are distinguishable
  • append=True default — multiple benchmark tests write to the same summary file during a CI run
  • Comment generation is pure Python — replaces bash + awk + sed in CI YAML, testable and maintainable

Example usage

from bencher.ci import warn_on_regressions, generate_regression_comment

# In a pytest benchmark test:
warn_on_regressions(
    bench.results[-1].regression_report,
    summary_path=Path("reports/performance_summary.txt"),
    plot_dir=Path("reports/regression_plots"),
    bench_name="bench_robot_planning",
    metrics_filter={"bench_robot_planning/planning_time": 20.0},
)

# In a CI step (or Python script called from CI):
comment_md = generate_regression_comment(
    "reports/performance_summary.txt",
    report_url="https://reports.example.com/latest/index.html",
    plot_url_prefix="https://reports.example.com/latest/regression_plots",
)

Test plan

  • 26 new tests covering all functions (test/test_ci.py)
  • Existing 103 regression tests still pass
  • Verify linting passes (pixi run ci)

🤖 Generated with Claude Code

Summary by Sourcery

Introduce a bencher.ci module providing reusable CI utilities for benchmark regression gating, along with comprehensive tests.

New Features:

  • Add write_performance_summary to serialize regression reports into CI-friendly pipe-delimited summaries with optional filtering and thresholds.
  • Add render_regression_plots to generate PNG diagnostics for regressed metrics, with optional metric filtering and bench name prefixing.
  • Add warn_on_regressions as a convenience wrapper to emit pytest warnings and optionally produce summaries and plots from a regression report.
  • Add parse_performance_summary to read performance summary files back into structured dictionaries for further processing.
  • Add generate_regression_comment to build GitHub-flavored Markdown PR comments summarizing benchmark regressions and linking reports and plots.

Tests:

  • Add dedicated test suite in test/test_ci.py covering summary writing, plot rendering, regression warnings, summary parsing, and comment generation behavior.

Generic CI integration utilities for benchmark regression workflows:

- write_performance_summary(): serialize RegressionReport to a
  pipe-delimited text file parseable by CI workflows (GitHub Actions,
  Jenkins, etc.), with optional per-metric filtering and thresholds

- render_regression_plots(): render diagnostic PNGs for regressed
  metrics, suitable for embedding in PR comments

- warn_on_regressions(): convenience wrapper that writes the summary,
  renders plots, and emits pytest warnings in one call

- parse_performance_summary(): parse the summary file back into
  structured dicts

- generate_regression_comment(): produce a GitHub-flavored Markdown
  PR comment with status icons, regression table, plot links, and
  actionable next-steps — replaces fragile shell-script comment
  generation in CI workflows

All functions are exported from the top-level bencher package.
No new dependencies required.
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented May 18, 2026

Reviewer's Guide

Introduces a new bencher.ci module that centralizes CI-facing regression utilities (summary serialization/parsing, plot rendering, warnings, and Markdown comment generation) and adds comprehensive tests covering the new functionality.

Sequence diagram for warn_on_regressions CI gating flow

sequenceDiagram
    actor PytestTest
    participant warn_on_regressions
    participant write_performance_summary
    participant render_regression_plots
    participant RegressionReport as report
    participant warnings

    PytestTest->>warn_on_regressions: warn_on_regressions(report, summary_path, plot_dir, metrics_filter, bench_name)

    alt summary_path is not None
        warn_on_regressions->>write_performance_summary: write_performance_summary(report, summary_path, metrics_filter, bench_name)
        write_performance_summary-->>warn_on_regressions: lines
    end

    alt plot_dir is not None
        warn_on_regressions->>render_regression_plots: render_regression_plots(report, plot_dir, metrics_filter, bench_name)
        render_regression_plots-->>warn_on_regressions: rendered
    end

    warn_on_regressions->>RegressionReport: has_regressions
    alt report.has_regressions
        warn_on_regressions->>RegressionReport: summary()
        RegressionReport-->>warn_on_regressions: summary
        warn_on_regressions->>warnings: warn("Benchmark regression detected:\n" + summary)
    end

    warn_on_regressions-->>PytestTest: return
Loading

File-Level Changes

Change Details Files
Add bencher.ci module providing CI integration helpers around RegressionReport/RegressionResult.
  • Implement write_performance_summary to emit a pipe-delimited summary format with optional metric filtering, bench-name prefixing, threshold overrides, and append/overwrite control.
  • Implement render_regression_plots to generate PNGs only for regressed (and optionally filtered) metrics, handling errors and returning a mapping from qualified metric name to file path.
  • Implement warn_on_regressions as a convenience wrapper that optionally writes summaries/plots and always emits a pytest-visible warning when the report contains regressions.
  • Implement parse_performance_summary to safely read the summary file into structured rows, skipping blank/malformed lines.
  • Implement generate_regression_comment and helpers to turn summary rows into GitHub-flavored Markdown with status icons, regression categorization, optional plot links, and guidance text.
bencher/ci.py
Add tests validating CI utilities behavior and integration with regression detection.
  • Add fixtures and helper constructors for RegressionResult/RegressionReport used across tests.
  • Test write_performance_summary behavior including filtering, threshold overrides, append vs overwrite, directory creation, bench-name prefixing, and handling of absolute methods and empty reports.
  • Test render_regression_plots for correct rendering of regressed metrics, skipping non-regressed metrics, and respecting metric filters and bench-name prefixes.
  • Test warn_on_regressions for warning emission, non-emission on clean reports, and side effects of writing summaries and plots when paths are provided.
  • Test parse_performance_summary and generate_regression_comment for correct parsing, table generation, regression/improvement categorization, absolute-method rendering, plot-link generation, and empty-input handling.
test/test_ci.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@github-actions
Copy link
Copy Markdown

Performance Report for 4ab2c9f

Metric Value
Total tests 1423
Total time 111.34s
Mean 0.0782s
Median 0.0010s
Top 10 slowest tests
Test Time (s)
test.test_bench_examples.TestBenchExamples::test_example_meta 18.045
test.test_over_time_save_perf::test_save_faster_without_aggregated_tab 5.238
test.test_hash_persistent.TestCrossProcessDeterminism::test_hash_stable_across_two_processes[ResultBool] 4.332
test.test_generated_examples::test_generated_example[cartesian_animation/example_cartesian_animation.py] 3.151
test.test_generated_examples::test_generated_example[result_types/result_image/example_result_image_to_video.py] 2.710
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_noise.py] 2.621
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_drift.py] 2.611
test.test_over_time_repeats.TestMaxSliderPoints::test_default_subsampling_caps_at_max 2.448
test.test_generated_examples::test_generated_example[regression/example_regression_tuning_step.py] 2.257
test.test_time_event_curve.TestTimeEventCurvePlot::test_curve_with_string_time_src_and_cat 1.090

Full report

Updated by Performance Tracking workflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant