CI: add MoE perf regression check (bench_moe)#3300
Open
zhiding512 wants to merge 3 commits into
Open
Conversation
Catch MoE kernel performance regressions per-PR by piggybacking on the existing test_moe_2stage.py run in aiter-test: - test_moe_2stage.py drops a moe_bench.csv (CSV-mode rows only, perf-only) - standard job uploads the csv alongside latest_test.log - new bench_moe job (ubuntu, no GPU) downloads the linux-aiter-mi35x-1 shard csv, compares vs the last main baseline (artifact moe-bench-<SHA>, 90d retention), reports to STEP_SUMMARY (warn-only for now) - main push / workflow_dispatch publishes the next baseline Warn thresholds default 1.10/1.15 (slow ratio cur/base); --fail-on-regress is off until noise floor is characterized over 2-4 weeks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a per-PR MoE kernel performance regression signal to CI by exporting MoE benchmark results from the existing test_moe_2stage.py run, comparing them against a main-branch baseline artifact, and reporting the comparison in the GitHub Actions step summary.
Changes:
- Export
moe_bench.csvfromop_tests/test_moe_2stage.pyand upload it with existing standard test artifacts. - Add a CI job (
bench_moe) that downloads the MoE CSV from the MI35X shard, fetches a baseline artifact from main, and emits a regression table toGITHUB_STEP_SUMMARY. - Introduce a benchmark comparison CLI (
scripts/compare_benchmark.py) plus a small wrapper script used by CI.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
scripts/compare_benchmark.py |
New Python CLI to diff baseline vs current MoE benchmark CSVs and classify regressions. |
op_tests/test_moe_2stage.py |
Writes moe_bench.csv from collected perf results (CSV-mode rows). |
.github/workflows/aiter-test.yaml |
Uploads moe_bench.csv from standard tests and adds the bench_moe comparison/publish job. |
.github/scripts/check_moe_regression.sh |
Wrapper to run the Python comparison with consistent labels/thresholds in CI. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+213
to
+215
| baseline, _ = _read_csv(args.baseline_csv) | ||
| current, key_cols = _read_csv(args.current_csv) | ||
|
|
Comment on lines
+124
to
+130
| key_cols = tuple(c for c in reader.fieldnames if c not in NON_KEY) | ||
| for raw in reader: | ||
| # Strip whitespace from every value to avoid silent join misses | ||
| # caused by trailing/leading spaces. | ||
| raw = {k: (v.strip() if isinstance(v, str) else v) for k, v in raw.items()} | ||
| key = tuple(sorted((c, raw.get(c, "")) for c in key_cols)) | ||
| rows[key] = raw |
Comment on lines
+270
to
+272
| # Build display_cols: walk key_cols, drop hidden, splice derived in place | ||
| _derived_sources = {src for sources in DERIVED_TUPLE_COLS.values() for src in sources} | ||
| _derived_first_src = {sources[0]: name for name, sources in DERIVED_TUPLE_COLS.items()} |
| csv_df = df[df["model"] != "legacy"].copy() | ||
| else: | ||
| csv_df = df.copy() | ||
| csv_df = csv_df.drop(columns=["logits_diff"], errors="ignore") |
Comment on lines
+863
to
+868
| echo "## MoE Bench (vs baseline)" >> "$GITHUB_STEP_SUMMARY" | ||
| echo '```' >> "$GITHUB_STEP_SUMMARY" | ||
| bash .github/scripts/check_moe_regression.sh \ | ||
| "$baseline_csv" /tmp/current.csv \ | ||
| | tee -a "$GITHUB_STEP_SUMMARY" | ||
| echo '```' >> "$GITHUB_STEP_SUMMARY" |
Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Catch MoE kernel performance regressions per-PR by piggybacking on the existing test_moe_2stage.py run in aiter-test:
Warn thresholds default 1.10/1.15 (slow ratio cur/base); --fail-on-regress is off until noise floor is characterized over 2-4 weeks.
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist