[AMD][GFX950] Add MI355 support and fix some rocm related issues #2025
[AMD][GFX950] Add MI355 support and fix some rocm related issues #2025zhangnju wants to merge 5 commits intotile-ai:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughThe PR updates ROCm GPU architecture configurations, refines HIP code generation for swizzle patterns and float2 reinterpretation using templated instantiation and lambda-based union constructs, and extends matrix core intrinsics to handle leading/pipeline-stage buffer dimensions through refined indexing logic. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tilelang/intrinsics/mfma_macro_generator.py (2)
331-331: Nitpick: EN DASH in comment.The comment uses an EN DASH (
–) instead of a regular hyphen (-). This is cosmetic but could cause issues with tools expecting ASCII.- # Leading dimensions (e.g. pipeline stage axis) – empty for 2-D buffers + # Leading dimensions (e.g. pipeline stage axis) - empty for 2-D buffers(Same on line 375)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tilelang/intrinsics/mfma_macro_generator.py` at line 331, Replace the non-ASCII EN DASH with a regular hyphen in the inline comment string "Leading dimensions (e.g. pipeline stage axis) – empty for 2-D buffers" and the similar comment around line 375 in tilelang/intrinsics/mfma_macro_generator.py; search for that exact comment text in the file (likely within the generate_mfma_macro or related function) and change "–" to "-" to use an ASCII hyphen.
331-354: LGTM – correctly handles leading dimensions for pipelined buffers.The fix properly extracts leading dimension indices (e.g., pipeline stage) and prepends them to the 2D indexing. For standard 2D buffers,
A_otheris empty, preserving the original behavior.Optional style improvement per RUF005: consider using iterable unpacking for slightly cleaner syntax:
♻️ Optional: use iterable unpacking
- A_local_buf[i * k_pack * local_size_a + local_id] = A_buf[tuple(A_other) + (A_base0 + l + row, A_base1 + r + col)] + A_local_buf[i * k_pack * local_size_a + local_id] = A_buf[(*A_other, A_base0 + l + row, A_base1 + r + col)](Same pattern for line 354)
,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tilelang/intrinsics/mfma_macro_generator.py` around lines 331 - 354, Replace the manual slice for leading dims with iterable unpacking to make intent clearer: unpack A_region.region into leading_regions, _, _ (e.g. leading_regions, _, _ = A_region.region) and then set A_other = [r.min for r in leading_regions]; this keeps the same semantics used by _warp_ldmatrix_a while using clearer syntax referencing A_region.region and A_other.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tilelang/intrinsics/mfma_macro_generator.py`:
- Line 331: Replace the non-ASCII EN DASH with a regular hyphen in the inline
comment string "Leading dimensions (e.g. pipeline stage axis) – empty for 2-D
buffers" and the similar comment around line 375 in
tilelang/intrinsics/mfma_macro_generator.py; search for that exact comment text
in the file (likely within the generate_mfma_macro or related function) and
change "–" to "-" to use an ASCII hyphen.
- Around line 331-354: Replace the manual slice for leading dims with iterable
unpacking to make intent clearer: unpack A_region.region into leading_regions,
_, _ (e.g. leading_regions, _, _ = A_region.region) and then set A_other =
[r.min for r in leading_regions]; this keeps the same semantics used by
_warp_ldmatrix_a while using clearer syntax referencing A_region.region and
A_other.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dc4a5876-db26-4fb3-b4f0-82b06f757d93
📒 Files selected for processing (3)
docker/Dockerfile.rocmsrc/target/codegen_hip.cctilelang/intrinsics/mfma_macro_generator.py
HI
This PR added MI355 support in the rocm docker file, and fix some issues when running the examples on MI355:
MFMA ldmatrix does not support pipelined (3D) shared buffer indexing:
issue: in some tests, we may meet the error: Buffer A_shared is 3-dimensional, cannot be indexed with the 2-dimensional indices provided.
root cause : In
MatrixCoreIntrinEmitter.ldmatrix_a/b, when a shared buffer becomes 3D after pipeline transformation (e.g.,[num_stages, block_M, block_K]), the old codes only extracts base indices from the last two dimensions (region[-2].min,region[-1].min), ignoring the leading stage dimension.HIP codegen uses function call syntax instead of template syntax for
rasterization2DRowissue: hipcc reports
no matching function for call to 'rasterization2DRow'root cause: CUDA codegen generates
tl::rasterization2DRow<10>()(template instantiation syntax), while HIP codegen incorrectly generatestl::rasterization2DRow(10)(function call syntax). Since this is atemplate <int panel_width>function, it cannot accept a runtime argument.HIP codegen takes address of temporary object in
float32x8broadcastissue: hipcc reports
taking the address of a temporary object of type 'float2'root cause: CUDA allows taking the address of a temporary object returned by
make_float2(...), but HIP is stricter and disallows this.Summary by CodeRabbit
Release Notes
New Features
Improvements