Skip to content

Fix TDM GSU offset for all MX datatype and transpose cases#7401

Merged
ssuyuanchang merged 2 commits into
developfrom
users/joschang/tdm_gsu_offset
May 28, 2026
Merged

Fix TDM GSU offset for all MX datatype and transpose cases#7401
ssuyuanchang merged 2 commits into
developfrom
users/joschang/tdm_gsu_offset

Conversation

@ssuyuanchang
Copy link
Copy Markdown
Contributor

@ssuyuanchang ssuyuanchang commented May 13, 2026

  • Replaced hardcoded strideRef(tc, 3) with dynamic unroll index
  • Supported GSU for MX datatype
  • Added GSU=[2,4,8] coverage to MX test yamls

Motivation

TDM support GSU MB

Technical Details

Add GSU offset in TDM mode

Test Plan

hipblast-test and test yaml passed

Test Result

hipblast-test and test yaml passed

Submission Checklist

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (77.83%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #7401   +/-   ##
========================================
  Coverage    61.87%   61.87%           
========================================
  Files         2086     2086           
  Lines       357038   357035    -3     
  Branches     53806    53806           
========================================
  Hits        220892   220892           
+ Misses      117348   117345    -3     
  Partials     18798    18798           
Flag Coverage Δ *Carryforward flag
TensileLite 25.94% <ø> (+<0.01%) ⬆️ Carriedforward from f96e909
hipBLAS 90.65% <ø> (ø) Carriedforward from f96e909
hipBLASLt 41.27% <ø> (ø)
hipCUB 82.21% <ø> (ø) Carriedforward from f96e909
hipDNN 85.87% <ø> (ø) Carriedforward from f96e909
hipFFT 50.00% <ø> (ø) Carriedforward from f96e909
hipRAND 76.12% <ø> (ø) Carriedforward from f96e909
hipSOLVER 69.24% <ø> (ø) Carriedforward from f96e909
hipSPARSE 85.09% <ø> (ø) Carriedforward from f96e909
rocBLAS 48.10% <ø> (ø) Carriedforward from f96e909
rocFFT 52.07% <ø> (ø) Carriedforward from f96e909
rocRAND 57.04% <ø> (ø) Carriedforward from f96e909
rocSOLVER 77.83% <ø> (ø) Carriedforward from f96e909
rocSPARSE 72.68% <ø> (ø) Carriedforward from f96e909

*This pull request uses carry forward flags. Click here to find out more.

Files with missing lines Coverage Δ
.../tensilelite/Tensile/Components/TensorDataMover.py 11.14% <ø> (+0.08%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@AndySu12 AndySu12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if this PR fully supports TLU/non-TLU, wave-separated/non-wave-separated cases, letting @copilot review twice.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates TDM address calculation to correctly account for GlobalSplitU (GSU) offsets across MX datatypes and transpose-related cases, and expands MX test coverage for GSU>1.

Changes:

  • Replace hardcoded TDM stride selection (strideRef(tc, 3)) with a dynamically derived “unroll” index.
  • Add GSU-based address offsetting in TDM (including MX-specific scaling).
  • Expand MX YAML test matrices to include GlobalSplitU: [1,2,4,8].

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
projects/hipblaslt/tensilelite/Tensile/Tests/common/gemm/gfx12/mxf8f4ss_gfx1250.yaml Expands MX test coverage for additional GSU values.
projects/hipblaslt/tensilelite/Tensile/Tests/common/gemm/gfx12/mxf8_gfx1250.yaml Expands multiple benchmark problem blocks to cover GSU=[2,4,8].
projects/hipblaslt/tensilelite/Tensile/Components/TensorDataMover.py Computes dynamic unroll stride and applies GSU offset in TDM start-address logic (MX + non-MX).
projects/hipblaslt/tensilelite/Tensile/Components/GSU.py Minor string-formatting modernization for MX size SGPR reference.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread projects/hipblaslt/tensilelite/Tensile/Components/TensorDataMover.py Outdated
@ssuyuanchang ssuyuanchang force-pushed the users/joschang/tdm_gsu_offset branch 2 times, most recently from 32ca18e to ab11deb Compare May 25, 2026 08:07
@ssuyuanchang ssuyuanchang force-pushed the users/joschang/tdm_gsu_offset branch 2 times, most recently from 9b65b35 to 1b67e1a Compare May 27, 2026 07:40
- Replaced hardcoded strideRef(tc, 3) with dynamic unroll index
- Supported GSU for MX datatype
- Added GSU=[2,4,8] coverage to MX test yamls
@ssuyuanchang
Copy link
Copy Markdown
Contributor Author

ssuyuanchang commented May 27, 2026

  • generated xml file: /josh/public/rocm-libraries/projects/hipblaslt/tensilelite/python_tests.xml -
    �[33m=========== �[32m82 passed�[0m, �[33m�[1m211 skipped�[0m, �[33m�[1m4 warnings�[0m�[33m in 3990.34s (1:06:30)�[0m�[33m ===========�[0m
    py3: exit 0 (3997.95 seconds) /josh/public/rocm-libraries/projects/hipblaslt/tensilelite> sh -c 'ARCH=$(invoke get-gpu-arch); export TENSILE_ARCHITECTURE=$ARCH; export TENSILELITE_CLIENT_ARGS='"'"' --gpu-targets '"'"'$ARCH; invoke build-client --build-dir /josh/public/rocm-libraries/projects/hipblaslt/tensilelite/build_tmp $TENSILELITE_CLIENT_ARGS || exit $?; pytest -v --basetemp=/josh/public/rocm-libraries/projects/hipblaslt/tensilelite/.tensile-tox/py3/tmp --junit-xml=/josh/public/rocm-libraries/projects/hipblaslt/tensilelite/python_tests.xml --junit-prefix=py3 --color=yes -n $TENSILE_NUM_PYTEST_WORKERS --prebuilt-client=/josh/public/rocm-libraries/projects/hipblaslt/tensilelite/build_tmp/tensilelite/client/tensilelite-client --global-parameters CheckASMCodeSize=True Tensile/Tests/common -k gfx1250; rc=$?; test $rc -eq 5 && exit 0; exit $rc' pid=186364
    py3: OK (4089.68=setup[24.21]+cmd[0.70,49.67,15.95,1.19,3997.95] seconds)
    congratulations :) (4089.71 seconds)

@ssuyuanchang ssuyuanchang merged commit a47f86f into develop May 28, 2026
47 of 49 checks passed
@ssuyuanchang ssuyuanchang deleted the users/joschang/tdm_gsu_offset branch May 28, 2026 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants