[Fix] Align buffer-resource num_records with logical tensor sizes by coderfeli · Pull Request #555 · ROCm/FlyDSL

coderfeli · 2026-05-23T15:19:36Z

Summary

Several MoE / GEMM kernels call buffer_ops.create_buffer_resource(arg, max_size=False) without supplying num_records_bytes. With max_size=False the descriptor size is inferred from the memref type at trace time, which silently truncates when the kernel is reused for tensors with larger logical extents — producing out-of-bounds reads (returns 0) or dropped writes.

This mirrors aiter PR ROCm/aiter#3314 and extends the same fix to two files aiter does not cover (moe_blockscale_2stage.py and the w_rsrc/bias_rsrc sites in mixed_moe_gemm_2stage.py).

What changes

For each affected site, compute num_records from compile-time constants (experts, model_dim, inter_dim, num_groups, …) and pass it explicitly to create_buffer_resource. Pre-existing dead expressions that lacked the * elem_bytes multiplier are also corrected.

Files (18 buffer-resource sites total)

File	Sites
`kernels/moe_gemm_2stage.py`	stage1: `w_rsrc`, `sw_rsrc`, `sorted_rsrc`, `sorted_w_rsrc`; stage2: `w_rsrc`, `sw_rsrc`
`kernels/moe_blockscale_2stage.py`	stage1: `w_rsrc`, `sw_rsrc`, `sorted_rsrc`, `sorted_w_rsrc`; stage2: `w_rsrc`, `sw_rsrc`
`kernels/mixed_moe_gemm_2stage.py`	stage1: `w_rsrc`, `bias_rsrc`, `sorted_scale_rsrc`; stage2: `w_rsrc`, `bias_rsrc`
`kernels/preshuffle_gemm.py`	`scale_a_rsrc` (with fp4 path)

Test plan

tests/kernels/test_preshuffle_gemm.py — 103 passed
tests/kernels/test_moe_gemm.py — 349 passed
tests/kernels/test_moe_blockscale.py — 4 passed
bash scripts/check_python_style.sh — clean

Several MoE / GEMM kernels call ``buffer_ops.create_buffer_resource(arg, max_size=False)`` without supplying ``num_records_bytes``. With ``max_size=False`` the descriptor size is inferred from the memref *type* at trace time, which silently truncates when the kernel is reused for tensors with larger logical extents — producing out-of-bounds reads (returns 0) or dropped writes. This mirrors aiter PR ROCm/aiter#3314 and extends the same fix to two files aiter does not cover (``moe_blockscale_2stage.py`` and the ``w_rsrc``/``bias_rsrc`` sites in ``mixed_moe_gemm_2stage.py``). Compute ``num_records`` from compile-time constants (experts, model_dim, inter_dim, num_groups, …) and pass it explicitly to ``create_buffer_resource``. Also fixes pre-existing dead expressions that lacked the ``* elem_bytes`` multiplier. Files touched (18 buffer-resource sites total): - kernels/moe_gemm_2stage.py — stage1: w/sw/sorted/sorted_w; stage2: w/sw - kernels/moe_blockscale_2stage.py — stage1: w/sw/sorted/sorted_w; stage2: w/sw - kernels/mixed_moe_gemm_2stage.py — stage1: w/bias/sorted_scale; stage2: w/bias - kernels/preshuffle_gemm.py — scale_a (with fp4 path) Test plan: - tests/kernels/test_preshuffle_gemm.py (103 passed) - tests/kernels/test_moe_gemm.py (349 passed) - tests/kernels/test_moe_blockscale.py (4 passed) - bash scripts/check_python_style.sh (clean) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderfeli closed this May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Align buffer-resource num_records with logical tensor sizes#555

[Fix] Align buffer-resource num_records with logical tensor sizes#555
coderfeli wants to merge 1 commit into
mainfrom
fix/buffer-resource-num-records

coderfeli commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

coderfeli commented May 23, 2026

Summary

What changes

Files (18 buffer-resource sites total)

Test plan

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant