[AMD][GFX950] Add MI355 support and fix some rocm related issues by zhangnju · Pull Request #2025 · tile-ai/tilelang

zhangnju · 2026-04-09T09:54:49Z

HI

This PR added MI355 support in the rocm docker file, and fix some issues when running the examples on MI355:

MFMA ldmatrix does not support pipelined (3D) shared buffer indexing:
issue: in some tests, we may meet the error: Buffer A_shared is 3-dimensional, cannot be indexed with the 2-dimensional indices provided.
root cause : In MatrixCoreIntrinEmitter.ldmatrix_a/b, when a shared buffer becomes 3D after pipeline transformation (e.g., [num_stages, block_M, block_K]), the old codes only extracts base indices from the last two dimensions (region[-2].min, region[-1].min), ignoring the leading stage dimension.
HIP codegen uses function call syntax instead of template syntax for rasterization2DRow
issue: hipcc reports no matching function for call to 'rasterization2DRow'
root cause: CUDA codegen generates tl::rasterization2DRow<10>() (template instantiation syntax), while HIP codegen incorrectly generates tl::rasterization2DRow(10) (function call syntax). Since this is a template <int panel_width> function, it cannot accept a runtime argument.
HIP codegen takes address of temporary object in float32x8 broadcast
issue: hipcc reports taking the address of a temporary object of type 'float2'
root cause: CUDA allows taking the address of a temporary object returned by make_float2(...), but HIP is stricter and disallows this.

Summary by CodeRabbit

Release Notes

New Features
- Added support for additional ROCm GPU architecture.
Improvements
- Optimized HIP code generation and matrix core operation code generation for improved performance.

github-actions · 2026-04-09T09:55:00Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-04-09T09:55:04Z

📝 Walkthrough

Walkthrough

The PR updates ROCm GPU architecture configurations, refines HIP code generation for swizzle patterns and float2 reinterpretation using templated instantiation and lambda-based union constructs, and extends matrix core intrinsics to handle leading/pipeline-stage buffer dimensions through refined indexing logic.

Changes

Cohort / File(s)	Summary
ROCm GPU Architecture Configuration `docker/Dockerfile.rocm`	Added `gfx950` GPU architecture to `PYTORCH_ROCM_ARCH` environment variable for expanded ROCm target support.
HIP Code Generation `src/target/codegen_hip.cc`	Modified threadblock swizzle pattern to use templated instantiation syntax (`tl::<func_name><panel_size>()` instead of function call); refactored float2-to-unsigned-long-long reinterpretation using an immediately-invoked lambda with internal union to avoid temporary address-taking.
Matrix Core Intrinsics `tilelang/intrinsics/mfma_macro_generator.py`	Extended `ldmatrix_a` and `ldmatrix_b` methods to extract leading dimensions and incorporate them into shared-buffer indexing via tuple concatenation, enabling support for multi-dimensional buffers with pipeline stages.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

[BugFix] Update buffer access in TensorCoreIntrinEmitter to handle variable dimensions correctly #1794: Modifies matrix-intrinsic buffer indexing with leading-dimension indices (A_other/B_other) in ldmatrix loads, directly parallel to the mfma_macro_generator.py changes in this PR.
[AMD] Enable FA2 fwd on AMD MI300X #1406: Modifies docker/Dockerfile.rocm's ROCm environment and PYTORCH_ROCM_ARCH settings.
[AMD] refactor MatrixCoreIntrinEmitter #860: Refactors and extends ldmatrix_a/ldmatrix_b indexing logic in tilelang/intrinsics/mfma_macro_generator.py with preshuffle-aware implementations.

Suggested reviewers

LeiWang1999
Gongen-Ali

Poem

🐰 A GPU's new gfx950 arrives with cheer,
Templates twirl where functions once were near,
Lambdas dance with unions, avoiding that address grab,
Leading dimensions align in buffers' tab—
Matrices march forth through pipeline stages fast! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: adding MI355 (GFX950) support to the ROCm setup and addressing multiple rocm-related issues found during testing.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

tilelang/intrinsics/mfma_macro_generator.py (2)
331-331: Nitpick: EN DASH in comment.

The comment uses an EN DASH (–) instead of a regular hyphen (-). This is cosmetic but could cause issues with tools expecting ASCII.
-        # Leading dimensions (e.g. pipeline stage axis) – empty for 2-D buffers
+        # Leading dimensions (e.g. pipeline stage axis) - empty for 2-D buffers
(Same on line 375)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tilelang/intrinsics/mfma_macro_generator.py` at line 331, Replace the
non-ASCII EN DASH with a regular hyphen in the inline comment string "Leading
dimensions (e.g. pipeline stage axis) – empty for 2-D buffers" and the similar
comment around line 375 in tilelang/intrinsics/mfma_macro_generator.py; search
for that exact comment text in the file (likely within the generate_mfma_macro
or related function) and change "–" to "-" to use an ASCII hyphen.
331-354: LGTM – correctly handles leading dimensions for pipelined buffers.

The fix properly extracts leading dimension indices (e.g., pipeline stage) and prepends them to the 2D indexing. For standard 2D buffers, A_other is empty, preserving the original behavior.

Optional style improvement per RUF005: consider using iterable unpacking for slightly cleaner syntax:
♻️ Optional: use iterable unpacking
-                        A_local_buf[i * k_pack * local_size_a + local_id] = A_buf[tuple(A_other) + (A_base0 + l + row, A_base1 + r + col)]
+                        A_local_buf[i * k_pack * local_size_a + local_id] = A_buf[(*A_other, A_base0 + l + row, A_base1 + r + col)]
(Same pattern for line 354)
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tilelang/intrinsics/mfma_macro_generator.py` around lines 331 - 354, Replace
the manual slice for leading dims with iterable unpacking to make intent
clearer: unpack A_region.region into leading_regions, _, _ (e.g.
leading_regions, _, _ = A_region.region) and then set A_other = [r.min for r in
leading_regions]; this keeps the same semantics used by _warp_ldmatrix_a while
using clearer syntax referencing A_region.region and A_other.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tilelang/intrinsics/mfma_macro_generator.py`:
- Line 331: Replace the non-ASCII EN DASH with a regular hyphen in the inline
comment string "Leading dimensions (e.g. pipeline stage axis) – empty for 2-D
buffers" and the similar comment around line 375 in
tilelang/intrinsics/mfma_macro_generator.py; search for that exact comment text
in the file (likely within the generate_mfma_macro or related function) and
change "–" to "-" to use an ASCII hyphen.
- Around line 331-354: Replace the manual slice for leading dims with iterable
unpacking to make intent clearer: unpack A_region.region into leading_regions,
_, _ (e.g. leading_regions, _, _ = A_region.region) and then set A_other =
[r.min for r in leading_regions]; this keeps the same semantics used by
_warp_ldmatrix_a while using clearer syntax referencing A_region.region and
A_other.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc4a5876-db26-4fb3-b4f0-82b06f757d93

📥 Commits

Reviewing files that changed from the base of the PR and between 86e37b7 and 4503bfa.

📒 Files selected for processing (3)

docker/Dockerfile.rocm
src/target/codegen_hip.cc
tilelang/intrinsics/mfma_macro_generator.py

zhangnju added 3 commits April 9, 2026 02:48

add MI355 support and fix some rocm releated issues

a70cce6

add MI355 support and fix some rocm releated issues

3f8b75f

update

4503bfa

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

zhangnju added 2 commits April 10, 2026 15:10

Merge branch 'tile-ai:main' into main

7c192ae

Merge branch 'tile-ai:main' into main

03b0ebc

zhangnju closed this by deleting the head repository Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD][GFX950] Add MI355 support and fix some rocm related issues #2025

[AMD][GFX950] Add MI355 support and fix some rocm related issues #2025
zhangnju wants to merge 5 commits intotile-ai:mainfrom
zhangnju:main

zhangnju commented Apr 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

coderabbitai bot commented Apr 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhangnju commented Apr 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

github-actions bot commented Apr 9, 2026

Uh oh!

coderabbitai bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhangnju commented Apr 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 9, 2026 •

edited

Loading