Problem Description
Summary
PR #2958 ("add fused_qk_rmsnorm_per_token_quant kernel", merged May 9 2026) renamed the public function fused_qk_rmsnorm to the private _fused_qk_rmsnorm and changed its signature by prepending q_out and k_out parameters. This is a breaking change for downstream consumers.
Impact
vLLM's MLA dual RMS norm fusion pass (fuse_mla_dual_rms_norm) imports fused_qk_rmsnorm by name and calls it with the original 6-parameter signature. With AITER post-PR#2958, this produces an ImportError at runtime, breaking the fusion pass for all MLA models (DeepSeek-V3, Kimi-K2, etc.).
We have worked around this on the vLLM side (vllm-project/vllm#42606) by probing for both names with hasattr dispatch, but this approach requires maintaining two code paths indefinitely.
Request
-
Provide a stable public name for the fused QK RMSNorm kernel. The current _fused_qk_rmsnorm (leading underscore) signals a private/internal API, which makes it risky for downstream projects to depend on. Either:
- Re-export a public
fused_qk_rmsnorm wrapper (can delegate to _fused_qk_rmsnorm internally), or
- Document
_fused_qk_rmsnorm as the intended public entry point and commit to its stability.
-
Preserve backward-compatible signatures when possible, or use a deprecation period (e.g., keep the old name as an alias for one release cycle) so downstream consumers have time to adapt.
Context
Operating System
Ubuntu 22.04.5 LTS (Jammy Jellyfish)
CPU
AMD EPYC 9575F 64-Core Processor
GPU
AMD Instinct MI355X
ROCm Version
ROCm 7.2.2
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Problem Description
Summary
PR #2958 ("add fused_qk_rmsnorm_per_token_quant kernel", merged May 9 2026) renamed the public function
fused_qk_rmsnormto the private_fused_qk_rmsnormand changed its signature by prependingq_outandk_outparameters. This is a breaking change for downstream consumers.Impact
vLLM's MLA dual RMS norm fusion pass (
fuse_mla_dual_rms_norm) importsfused_qk_rmsnormby name and calls it with the original 6-parameter signature. With AITER post-PR#2958, this produces anImportErrorat runtime, breaking the fusion pass for all MLA models (DeepSeek-V3, Kimi-K2, etc.).We have worked around this on the vLLM side (vllm-project/vllm#42606) by probing for both names with
hasattrdispatch, but this approach requires maintaining two code paths indefinitely.Request
Provide a stable public name for the fused QK RMSNorm kernel. The current
_fused_qk_rmsnorm(leading underscore) signals a private/internal API, which makes it risky for downstream projects to depend on. Either:fused_qk_rmsnormwrapper (can delegate to_fused_qk_rmsnorminternally), or_fused_qk_rmsnormas the intended public entry point and commit to its stability.Preserve backward-compatible signatures when possible, or use a deprecation period (e.g., keep the old name as an alias for one release cycle) so downstream consumers have time to adapt.
Context
b8aacd39a(not yet in a tagged release; 0.1.13 still has the old API)Operating System
Ubuntu 22.04.5 LTS (Jammy Jellyfish)
CPU
AMD EPYC 9575F 64-Core Processor
GPU
AMD Instinct MI355X
ROCm Version
ROCm 7.2.2
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response