[MOE] Add fused dynamic MXFP8 quant + moe_sort HIP path by yadaish · Pull Request #3312 · ROCm/aiter

yadaish · 2026-05-22T08:46:12Z

Adds fused_dynamic_mxfp8_quant_moe_sort_hip that quantizes activations to fp8 with e8m0 group scales and writes the swizzled scale layout consumed by the FlyDSL a8w4 stage1/stage2 GEMM. Wires it into fused_moe_2stages to replace the Triton fused_quant_fp8_sort path, with AITER_MOE_A8W4_BYPASS_QUANT to fall back to the prior unquantized behavior.

For e8m0-scaled fp8 output, the divisor is switched to 1/floor_pow2(max) so row_scale stays a pure power of two and matches the 2^(byte-127) dequant scale encoded in the e8m0 byte.

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Adds fused_dynamic_mxfp8_quant_moe_sort_hip that quantizes activations to fp8 with e8m0 group scales and writes the swizzled scale layout consumed by the FlyDSL a8w4 stage1/stage2 GEMM. Wires it into fused_moe_2stages to replace the Triton fused_quant_fp8_sort path, with AITER_MOE_A8W4_BYPASS_QUANT to fall back to the prior unquantized behavior. For e8m0-scaled fp8 output, the divisor is switched to 1/floor_pow2(max) so row_scale stays a pure power of two and matches the 2^(byte-127) dequant scale encoded in the e8m0 byte. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

github-actions · 2026-05-22T08:47:04Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3312 --add-label <label>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MOE] Add fused dynamic MXFP8 quant + moe_sort HIP path#3312

[MOE] Add fused dynamic MXFP8 quant + moe_sort HIP path#3312
yadaish wants to merge 1 commit into
mainfrom
dev/a8w4_moe_quant

yadaish commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yadaish commented May 22, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions Bot commented May 22, 2026

🏷️ CI Guide

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant