Added hipblaslt bias fused kernels into autotune backend for addmm by guangzlu · Pull Request #3012 · ROCm/pytorch

guangzlu · 2026-02-28T06:48:36Z

Motivation

We found that in some GEMM with bias cases, we got poorer performance with torch compile and max autotune than without it. And the root cause is that when we turn on max autotune, inductor cannot call hipblaslt bias fused kernels. And if it doesn't call fused kernel, it will call a triton fused kerenl or a separate Aten solution (a single GEMM kernel + an elementwise kernel). And the separate Aten solution has a worse perf than hipblaslt fused kernels.

Technical Details

In the current code, inductor will use inp_expanded as the bias argument for addmm lowering kernel inputs. Inp_expanded is the bias argument expanded from 1D to 2D after arg processing. Hipblaslt bias fused kernel cannot support 2D bias input, so inductor cannot call hipblaslt bias fused kernels now.
This PR use original 1D bias argument as the kernel input for aten addmm to enable hipblaslt bias fused kernel in inductor.

Test Plan

Here is a simple unittest to test perf of Linear with bias, which will lower into addmm in inductor.
run-test.sh
unittest_linear_mi308x_fp16.py

Test Result

In this case, mnk is [4352 , 1024 , 1024]
Without the PR, inductor will run two kernels, and the perf is GEMM 81.659us, elementwise 13.619us, the total execution time is 85us.
With the PR, inductor will choose a hipblaslt bias fused kernel, the execution time is 63.504us.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

rocm-repo-management-api · 2026-02-28T07:02:29Z

Jenkins build for a844cd4069ac6f83256820788b5dd7ae68a62633 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Added hipblaslt bias fused kernels into autotune backend for addmm

a844cd4

guangzlu requested a review from jataylo February 28, 2026 06:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added hipblaslt bias fused kernels into autotune backend for addmm#3012

Added hipblaslt bias fused kernels into autotune backend for addmm#3012
guangzlu wants to merge 1 commit intorelease/2.9from
release_/2.9_aten_addmm_opt

guangzlu commented Feb 28, 2026 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Feb 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

guangzlu commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

rocm-repo-management-api bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

guangzlu commented Feb 28, 2026 •

edited

Loading

rocm-repo-management-api bot commented Feb 28, 2026 •

edited

Loading