[FlyDSL][MOE] Enable a8w8 blockscale moe splitk in flydsl by lalala-sh · Pull Request #3280 · ROCm/aiter

lalala-sh · 2026-05-20T06:09:16Z

Motivation

Enable the FlyDSL backend for a8w8 FP8 blockscale (per_1x128 / per_128x128) MoE in fused_moe, and provide the FlyDSL stage1/2 blockscale kernels + tuner integration + tuned configs for the four dsv3 v3 shapes ((model_dim=7168, inter_dim={256,512}) × (E,topk)={(256,8), (257,9)}).
For small-token decode (M ≤ 4) the FlyDSL 2-stage path now consistently beats the ASM 1-stage blockscale kernel on gfx950 (e.g. M=1: 23.6 us vs 26.6 us, ~13% faster); for medium / large M the tuner still picks the ASM 1-stage where it is faster, so the change is strictly opt-in via the tuned CSV.

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

github-actions · 2026-05-20T06:10:19Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3280 --add-label <label>

github-actions · 2026-05-20T06:10:25Z

+    if out_dtype not in ("f16", "bf16"):
+        raise ValueError(f"out_dtype must be 'f16' or 'bf16', got {out_dtype!r}")
+    # NOTE: don't materialize MLIR types outside an active MLIR Context.
+    out_mlir = lambda: (lambda ty: ty() if callable(ty) else ty)(T.f16 if out_dtype == "f16" else T.bf16)


⚠️ [ruff] <E731> _{reported by reviewdog 🐶}
Do not assign a lambda expression, use a def

Suggested change

out_mlir = lambda: (lambda ty: ty() if callable(ty) else ty)(T.f16 if out_dtype == "f16" else T.bf16)

def out_mlir():

return (lambda ty: ty() if callable(ty) else ty)(T.f16 if out_dtype == "f16" else T.bf16)

github-actions · 2026-05-20T06:10:26Z

+        elem_type_tag = "bf16"
+    else:
+        raise ValueError(f"Unsupported dtype: {dtype_str}")
+    compute_type = lambda: T.f32


⚠️ [ruff] <E731> _{reported by reviewdog 🐶}
Do not assign a lambda expression, use a def

Suggested change

compute_type = lambda: T.f32

def compute_type():

return T.f32

github-actions · 2026-05-20T06:10:26Z

+    else:
+        raise ValueError(f"Unsupported dtype: {dtype_str}")
+    compute_type = lambda: T.f32
+    i8_type = lambda: T.i8


⚠️ [ruff] <E731> _{reported by reviewdog 🐶}
Do not assign a lambda expression, use a def

Suggested change

i8_type = lambda: T.i8

def i8_type():

return T.i8

github-actions · 2026-05-20T06:10:26Z

+        a_dtype_str = "fp8"
+


⚠️ [ruff] <F841> _{reported by reviewdog 🐶}
Local variable a_dtype_str is assigned to but never used

Suggested change

a_dtype_str = "fp8"

coderfeli · 2026-05-20T11:45:18Z

@lalala-sh ck ? or flydsl?

flydsl enable

8d7f05d

lalala-sh requested a review from a team May 20, 2026 06:09

github-actions Bot reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FlyDSL][MOE] Enable a8w8 blockscale moe splitk in flydsl #3280

[FlyDSL][MOE] Enable a8w8 blockscale moe splitk in flydsl #3280
lalala-sh wants to merge 1 commit into
mainfrom
wjx/a8w8_moe_perf

lalala-sh commented May 20, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot May 20, 2026

Uh oh!

github-actions Bot May 20, 2026

Uh oh!

github-actions Bot May 20, 2026

Uh oh!

github-actions Bot May 20, 2026

Uh oh!

coderfeli commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	out_mlir = lambda: (lambda ty: ty() if callable(ty) else ty)(T.f16 if out_dtype == "f16" else T.bf16)
	def out_mlir():
	return (lambda ty: ty() if callable(ty) else ty)(T.f16 if out_dtype == "f16" else T.bf16)

Conversation

lalala-sh commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions Bot commented May 20, 2026

🏷️ CI Guide

Uh oh!

github-actions Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderfeli commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lalala-sh commented May 20, 2026 •

edited

Loading