Zhimding/fix aot moe pad 0522 by coderfeli · Pull Request #3313 · ROCm/aiter

coderfeli · 2026-05-22T15:28:57Z

Zhimding/fix aot moe pad 0522. Enable aot for dim pad

Signed-off-by: root <zhimding@amd.com> Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-05-22T15:29:22Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3313 --add-label <label>

Co-authored-by: Cursor <cursoragent@cursor.com>

coderfeli

Plumbing looks fine, but the pad column handling and the test-reference patching feel hacky. A few inline comments.

coderfeli · 2026-05-23T13:17:53Z



+def _row_optional_int(row: dict[str, str], name: str, *aliases: str) -> int:
+    for key in (name, *aliases):


Carrying the typo hiddne_pad as a permanent alias bakes the bug into the schema forever. Prefer a one-shot rename of the CSV column + code, and drop the alias. The runtime cache key is (model_dim_pad, inter_dim_pad) ints, so renaming the CSV column does not cause cache misses.

coderfeli · 2026-05-23T13:17:53Z

    shape_str = (
        f"{kernel_name}  "
        f"model_dim={model_dim} inter_dim={inter_dim} "
+        f"hidden_pad={kwargs.get('hidden_pad', 0)} "


hidden_pad / intermediate_pad are already declared as explicit kwargs on _precompile_to_cache; reading them back through kwargs.get(...) here is inconsistent. Either accept them as named params on compile_one_config or pass them through a single dict — picking one keeps the call graph honest.

coderfeli · 2026-05-23T13:17:53Z

+            else:
+                df["hidden_pad"] = 0
+        df["hidden_pad"] = df["hidden_pad"].fillna(0).astype(int)
+        if "intermediate_pad" not in df.columns:


Three places now have to know about the pad columns (_normalize_pad_cols, _FILL_DEFAULTS, dedup keys). Consider a single PAD_COLUMNS = ("hidden_pad", "intermediate_pad") constant and iterate, so future pad fields don't have to be added in N spots.

coderfeli · 2026-05-23T13:17:53Z

            )

-        _FILL_DEFAULTS = {"xbf16": 0, "run_1stage": 0, "ksplit": 0}
+        _FILL_DEFAULTS = {


Same comment as on the AOT side: please don't enshrine hiddne_pad in _FILL_DEFAULTS. Fix the CSV(s) and drop the misspelled key.

coderfeli · 2026-05-23T13:17:53Z

            w1[:, -intermediate_pad:, :] = 0
            w1[:, inter_dim - intermediate_pad : inter_dim, :] = 0
        exp_bias1 = torch.clamp(torch.randn((E, inter_dim * 2), dtype=dtype), -1.0, 1.0)
+        # Dense torch reference still evaluates padded lanes; keep padded


Zeroing padded bias lanes in the reference path is masking a semantic mismatch — the dense torch reference still evaluates padded lanes, while the kernel does not. A cleaner approach: build the reference at logical dims (model_dim - hidden_pad, inter_dim - intermediate_pad), then pad to physical dims only for the kernel input. That way you don't need to patch bias/output to make the compare pass.

coderfeli · 2026-05-23T13:17:53Z

 ):
    if get_gfx() not in ["gfx950"] and qType in [aiter.QuantType.per_1x32]:
        return
+    assert (


assert 0 <= hidden_pad < model_dim allows nonsense values like model_dim - 1. The real constraint is probably (model_dim - hidden_pad) % tile_n == 0 (or similar tile alignment). Consider a stronger check, or at minimum a comment stating which invariant this guards.

coderfeli · 2026-05-23T13:17:53Z

+    valid_model_dim = model_dim - hidden_pad
+    out2_ref_check = out2_ref[:, :valid_model_dim]
+    out2_ck_check = out2_ck[:, :valid_model_dim]
    err = checkAllclose(


Slicing both ref and ck to valid_model_dim here is fine, but combined with the bias zeroing above it's a double patch. If the reference is rebuilt at logical dims (see comment on bias), this slice goes away.

coderfeli · 2026-05-23T13:17:54Z

-256,16384,3072,512,128,4,ActivationType.Swiglu,torch.bfloat16,torch.float4_e2m1fn_x2,torch.float4_e2m1fn_x2,QuantType.per_1x32,1,0,64,0,206.4827,flydsl_moe1_afp4_wfp4_bf16_t64x128x256_w4_bnt0_xcd4,0.0%,285.1907,flydsl_moe2_afp4_wfp4_bf16_t64x256x256_reduce_xcd4,0.6%,491.6734,0,0,1257.9,1535.52,
-256,32768,3072,512,128,4,ActivationType.Swiglu,torch.bfloat16,torch.float4_e2m1fn_x2,torch.float4_e2m1fn_x2,QuantType.per_1x32,1,0,64,0,367.6363,flydsl_moe1_afp4_wfp4_bf16_t64x128x256_w2_bnt0_xcd4,0.0%,517.2156,flydsl_moe2_afp4_wfp4_bf16_t64x256x256_reduce_xcd4,0.6%,884.8519,0,0,1397.92,1023.87,
+cu_num,token,model_dim,inter_dim,expert,topk,act_type,dtype,q_dtype_a,q_dtype_w,q_type,use_g1u1,doweight_stage1,block_m,ksplit,us1,kernelName1,err1,us2,kernelName2,err2,us,run_1stage,xbf16,hidden_pad,intermediate_pad,tflops,bw,_tag
+256,256,3072,3072,128,4,ActivationType.Swiglu,torch.bfloat16,torch.float4_e2m1fn_x2,torch.float4_e2m1fn_x2,QuantType.per_1x32,1,0,32,0,208.4462,flydsl_moe1_afp4_wfp4_bf16_t32x128x256_w2,0.0%,109.1431,flydsl_moe2_afp4_wfp4_bf16_t32x256x256_atomic_bnt2_persist,5.0%,317.5893,0,0,192,192,182.57,11418.01,


All 24 rows use the same 3 constants for the pad columns (192 / {192,96,152}). If pad is a function of model_dim / inter_dim + tile alignment, it should be derived in code rather than duplicated in the tuned CSV — otherwise every retune has to remember to fill these columns consistently.

zhiding512 and others added 2 commits May 22, 2026 11:08

Support MoE pad fields in GPT-OSS configs

853247e

Signed-off-by: root <zhimding@amd.com> Co-authored-by: Cursor <cursoragent@cursor.com>

fix config

6ec4f7a

coderfeli requested a review from a team May 22, 2026 15:28

coderfeli marked this pull request as draft May 22, 2026 15:29

coderfeli and others added 3 commits May 23, 2026 11:08

Fix MoE padded dimension test reference

722ac53

Co-authored-by: Cursor <cursoragent@cursor.com>

Apply Black formatting to MoE pad test

16bd0b4

Co-authored-by: Cursor <cursoragent@cursor.com>

Merge branch 'main' into zhimding/fix_aot_moe_pad_0522

1ecdc23

coderfeli marked this pull request as ready for review May 23, 2026 12:33

coderfeli commented May 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zhimding/fix aot moe pad 0522#3313

Zhimding/fix aot moe pad 0522#3313
coderfeli wants to merge 5 commits into
mainfrom
zhimding/fix_aot_moe_pad_0522

coderfeli commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

coderfeli left a comment

Uh oh!

coderfeli May 23, 2026

Uh oh!

coderfeli May 23, 2026

Uh oh!

coderfeli May 23, 2026

Uh oh!

coderfeli May 23, 2026

Uh oh!

coderfeli May 23, 2026

Uh oh!

coderfeli May 23, 2026

Uh oh!

coderfeli May 23, 2026

Uh oh!

coderfeli May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		def _row_optional_int(row: dict[str, str], name: str, *aliases: str) -> int:
		for key in (name, *aliases):

Conversation

coderfeli commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

🏷️ CI Guide

Uh oh!

coderfeli left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants