[Feat] Align quant and fused layernorm kernels with aiter/triton by cschenjunlin · Pull Request #549 · ROCm/FlyDSL

cschenjunlin · 2026-05-20T10:58:25Z

Motivation

Align quant and fused layernorm kernels with aiter/triton

Technical Details

Test Plan

Test Result

Tested on MI308+ROCm7.1:

quant rmsnorm performance compare:

====================================================================================================
Perf Compare (gpu us): FlyDSL vs AIter
====================================================================================================
op         shape              dtype  FlyDSL(gpu us)  AIter(gpu us)    speedup
layernorm_dq 64x256             f32              29.2           37.3      1.28x
layernorm_dq 128x1024           f32              28.9           37.7      1.30x
layernorm_dq 32x128             f16              28.8           36.8      1.28x
layernorm_dq 64x2000            f32              29.7           39.3      1.32x
layernorm_dq 16x512             bf16             29.7           39.0      1.32x
layernorm_dq 1024x8192          bf16             65.8           55.0      0.84x
layernorm_dq 32768x8192         bf16          1,934.4        1,483.2      0.77x
layernorm_sq 64x256             f32              32.6           40.8      1.25x
layernorm_sq 128x1024           f32              33.2           42.2      1.27x
layernorm_sq 32x128             f16              33.2           40.3      1.21x
layernorm_sq 64x2000            f32              32.6           42.7      1.31x
layernorm_sq 16x512             bf16             32.1           42.3      1.32x
layernorm_sq 1024x8192          bf16             70.1           58.2      0.83x
layernorm_sq 32768x8192         bf16          2,113.5        1,559.7      0.74x
====================================================================================================

Submission Checklist

fused_add_layernorm_kernel
quant_layernorm_kernel
quant_fused_add_layernorm_kernel

Use the current fx.* numeric and register helper style in layernorm quant variants so they stay consistent with main's RMSNorm cleanup.

sjfeng1999 · 2026-05-20T11:30:38Z

+    RED_SLOTS = max(1, (BLOCK_THREADS + WARP_SIZE - 1) // WARP_SIZE)
+    elem_bits = 32 if dtype_str == "f32" else 16
+
+    allocator = SmemAllocator(None, arch=arch)


Could you update to use SharedAllocator? Old interface SmemAllocator may be deprecated in the future.

I have replaced the SmemAllocator with SharedAllocator in all the varient kernels.

cschenjunlin added 3 commits May 15, 2026 16:03

[Enh] Add fused-add, quant kernels and tests

76d97f7

Align layernorm quant paths with current register API

394339c

Use the current fx.* numeric and register helper style in layernorm quant variants so they stay consistent with main's RMSNorm cleanup.

Add layernorm variant kernels and tests

fec4f9d

sjfeng1999 reviewed May 20, 2026

View reviewed changes

cschenjunlin added 3 commits May 21, 2026 18:56

Migrate layernorm variants to SharedAllocator

efc477c

Align layernorm kernels with struct-level SharedAllocator access

cef8eef

Merge branch 'main' into cjl/fused_quant_layernorm

37a6b29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Align quant and fused layernorm kernels with aiter/triton#549

[Feat] Align quant and fused layernorm kernels with aiter/triton#549
cschenjunlin wants to merge 6 commits into
mainfrom
cjl/fused_quant_layernorm

cschenjunlin commented May 20, 2026 •

edited

Loading

Uh oh!

sjfeng1999 May 20, 2026

Uh oh!

cschenjunlin May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cschenjunlin commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

sjfeng1999 May 20, 2026

Choose a reason for hiding this comment

Uh oh!

cschenjunlin May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cschenjunlin commented May 20, 2026 •

edited

Loading