CI: add MoE perf regression check (bench_moe) by zhiding512 · Pull Request #3300 · ROCm/aiter

zhiding512 · 2026-05-21T08:53:49Z

Catch MoE kernel performance regressions per-PR by piggybacking on the existing test_moe_2stage.py run in aiter-test:

test_moe_2stage.py drops a moe_bench.csv (CSV-mode rows only, perf-only)
standard job uploads the csv alongside latest_test.log
new bench_moe job (ubuntu, no GPU) downloads the linux-aiter-mi35x-1 shard csv, compares vs the last main baseline (artifact moe-bench-, 90d retention), reports to STEP_SUMMARY (warn-only for now)
main push / workflow_dispatch publishes the next baseline

Warn thresholds default 1.10/1.15 (slow ratio cur/base); --fail-on-regress is off until noise floor is characterized over 2-4 weeks.

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Catch MoE kernel performance regressions per-PR by piggybacking on the existing test_moe_2stage.py run in aiter-test: - test_moe_2stage.py drops a moe_bench.csv (CSV-mode rows only, perf-only) - standard job uploads the csv alongside latest_test.log - new bench_moe job (ubuntu, no GPU) downloads the linux-aiter-mi35x-1 shard csv, compares vs the last main baseline (artifact moe-bench-<SHA>, 90d retention), reports to STEP_SUMMARY (warn-only for now) - main push / workflow_dispatch publishes the next baseline Warn thresholds default 1.10/1.15 (slow ratio cur/base); --fail-on-regress is off until noise floor is characterized over 2-4 weeks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-21T08:54:24Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3300 --add-label <label>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds a per-PR MoE kernel performance regression signal to CI by exporting MoE benchmark results from the existing test_moe_2stage.py run, comparing them against a main-branch baseline artifact, and reporting the comparison in the GitHub Actions step summary.

Changes:

Export moe_bench.csv from op_tests/test_moe_2stage.py and upload it with existing standard test artifacts.
Add a CI job (bench_moe) that downloads the MoE CSV from the MI35X shard, fetches a baseline artifact from main, and emits a regression table to GITHUB_STEP_SUMMARY.
Introduce a benchmark comparison CLI (scripts/compare_benchmark.py) plus a small wrapper script used by CI.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`scripts/compare_benchmark.py`	New Python CLI to diff baseline vs current MoE benchmark CSVs and classify regressions.
`op_tests/test_moe_2stage.py`	Writes `moe_bench.csv` from collected perf results (CSV-mode rows).
`.github/workflows/aiter-test.yaml`	Uploads `moe_bench.csv` from standard tests and adds the `bench_moe` comparison/publish job.
`.github/scripts/check_moe_regression.sh`	Wrapper to run the Python comparison with consistent labels/thresholds in CI.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    baseline, _ = _read_csv(args.baseline_csv)
+    current, key_cols = _read_csv(args.current_csv)
+


+        key_cols = tuple(c for c in reader.fieldnames if c not in NON_KEY)
+        for raw in reader:
+            # Strip whitespace from every value to avoid silent join misses
+            # caused by trailing/leading spaces.
+            raw = {k: (v.strip() if isinstance(v, str) else v) for k, v in raw.items()}
+            key = tuple(sorted((c, raw.get(c, "")) for c in key_cols))
+            rows[key] = raw


+    # Build display_cols: walk key_cols, drop hidden, splice derived in place
+    _derived_sources = {src for sources in DERIVED_TUPLE_COLS.values() for src in sources}
+    _derived_first_src = {sources[0]: name for name, sources in DERIVED_TUPLE_COLS.items()}


+        csv_df = df[df["model"] != "legacy"].copy()
+    else:
+        csv_df = df.copy()
+    csv_df = csv_df.drop(columns=["logits_diff"], errors="ignore")


+          echo "## MoE Bench (vs baseline)" >> "$GITHUB_STEP_SUMMARY"
+          echo '```' >> "$GITHUB_STEP_SUMMARY"
+          bash .github/scripts/check_moe_regression.sh \
+              "$baseline_csv" /tmp/current.csv \
+              | tee -a "$GITHUB_STEP_SUMMARY"
+          echo '```' >> "$GITHUB_STEP_SUMMARY"


Co-authored-by: Cursor <cursoragent@cursor.com>

zhiding512 requested review from a team and Copilot May 21, 2026 08:53

Copilot started reviewing on behalf of zhiding512 May 21, 2026 08:54 View session

fix: apply black format to bench_moe files

c6ea826

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI reviewed May 21, 2026

View reviewed changes

Rename MoE bench CI summary.

14eee5f

Co-authored-by: Cursor <cursoragent@cursor.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: add MoE perf regression check (bench_moe)#3300

CI: add MoE perf regression check (bench_moe)#3300
zhiding512 wants to merge 3 commits into
mainfrom
zhimding/add_flydsl_moe_benchmark_0521

zhiding512 commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		baseline, _ = _read_csv(args.baseline_csv)
		current, key_cols = _read_csv(args.current_csv)

Conversation

zhiding512 commented May 21, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions Bot commented May 21, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants