mla: add fp8 qh32 seqlen=1 persistent kernel support on gfx950 by alexioslyrakis-amd · Pull Request #3304 · ROCm/aiter

alexioslyrakis-amd · 2026-05-21T13:00:52Z

Summary

Adds the mla_a8w8_qh32_qseqlen1_gqaratio32_ps kernel for gfx950 (MI350X), covering the decode case with gqa_ratio=32, fp8 Q/KV, and seqlen_q=1.

asm_mla.cu: add seqlen=1 dispatch branch for gqa_ratio=32 fp8/fp8 (sub_Q=32); update error message to list all supported seqlens (1/2/4)
v1_2_device.cuh: add seqlen=1 to natively_supported conditions for gfx950 nhead=32 fp8
mla.py: add gfx950/nhead=32/fp8/seqlen=1 to the native-path selector in mla_decode_fwd
mla_asm.csv: register new .co entry for the qh32 seqlen=1 persistent kernel
mla_a8w8_qh32_qseqlen1_gqaratio32_ps.co: compiled kernel binary
test_mla.py, test_mla_persistent.py: enable nhead=32 fp8 seqlen=1 test paths

Test plan

python op_tests/test_mla_persistent.py --nhead 32,1 --dtype fp8 --kv_dtype fp8 --batchSize 512 --ctxLen 4096 on MI350X

Add the mla_a8w8_qh32_qseqlen1_gqaratio32_ps kernel for gfx950 (MI350X). This covers the decode case with gqa_ratio=32, fp8 Q/KV, and seqlen_q=1. - asm_mla.cu: add seqlen=1 dispatch branch for gqa_ratio=32 fp8/fp8 (sub_Q=32); update error message to reflect supported seqlens 1/2/4 - v1_2_device.cuh: add seqlen=1 to natively_supported conditions for gfx950 nhead=32 fp8 - mla.py: add gfx950/nhead=32/fp8/seqlen=1 to the native-path selector - mla_asm.csv: register new .co entry for qh32 seqlen=1 persistent kernel - mla_a8w8_qh32_qseqlen1_gqaratio32_ps.co: compiled kernel binary - test_mla.py, test_mla_persistent.py: enable nhead=32 fp8 seqlen=1 test paths

github-actions · 2026-05-21T13:01:11Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
`ci:atom`	ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
`ci:atom_full`	ATOM accuracy suite for PR and main models from ATOM `models_accuracy.json`
`ci:vllm`	vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
`ci:all`	All standard extended tests (excludes `ci:atom_full`)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 3304 --add-label <label>

alexioslyrakis-amd requested a review from a team May 21, 2026 13:00

alexioslyrakis-amd requested review from JohnNikolay84 and valarLip May 21, 2026 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mla: add fp8 qh32 seqlen=1 persistent kernel support on gfx950#3304

mla: add fp8 qh32 seqlen=1 persistent kernel support on gfx950#3304
alexioslyrakis-amd wants to merge 1 commit into
mainfrom
alyr/mla-a8w8-qh32-seqlen1-ps

alexioslyrakis-amd commented May 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexioslyrakis-amd commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions Bot commented May 21, 2026

🏷️ CI Guide

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexioslyrakis-amd commented May 21, 2026 •

edited

Loading