Skip to content

Added hipblaslt bias fused kernels into autotune backend for addmm#3012

Open
guangzlu wants to merge 1 commit intorelease/2.9from
release_/2.9_aten_addmm_opt
Open

Added hipblaslt bias fused kernels into autotune backend for addmm#3012
guangzlu wants to merge 1 commit intorelease/2.9from
release_/2.9_aten_addmm_opt

Conversation

@guangzlu
Copy link

@guangzlu guangzlu commented Feb 28, 2026

Motivation

We found that in some GEMM with bias cases, we got poorer performance with torch compile and max autotune than without it. And the root cause is that when we turn on max autotune, inductor cannot call hipblaslt bias fused kernels. And if it doesn't call fused kernel, it will call a triton fused kerenl or a separate Aten solution (a single GEMM kernel + an elementwise kernel). And the separate Aten solution has a worse perf than hipblaslt fused kernels.

Technical Details

In the current code, inductor will use inp_expanded as the bias argument for addmm lowering kernel inputs. Inp_expanded is the bias argument expanded from 1D to 2D after arg processing. Hipblaslt bias fused kernel cannot support 2D bias input, so inductor cannot call hipblaslt bias fused kernels now.
This PR use original 1D bias argument as the kernel input for aten addmm to enable hipblaslt bias fused kernel in inductor.

Test Plan

Here is a simple unittest to test perf of Linear with bias, which will lower into addmm in inductor.
run-test.sh
unittest_linear_mi308x_fp16.py

Test Result

In this case, mnk is [4352 , 1024 , 1024]
Without the PR, inductor will run two kernels, and the perf is GEMM 81.659us, elementwise 13.619us, the total execution time is 85us.
With the PR, inductor will choose a hipblaslt bias fused kernel, the execution time is 63.504us.

Submission Checklist

@guangzlu guangzlu requested a review from jataylo February 28, 2026 06:49
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Feb 28, 2026

Jenkins build for a844cd4069ac6f83256820788b5dd7ae68a62633 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant