[AMD][gfx950] Add gfx950 support for DeepGeem example#2028
[AMD][gfx950] Add gfx950 support for DeepGeem example#2028LeiWang1999 merged 6 commits intotile-ai:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds support for new HIP accumulator vector types (float32x16, float32x32, bfloat16x16) in HIP codegen, defines a float32x32 alias, adjusts MFMA macro buffer indexing to handle extra leading dimensions, and updates ROCm Dockerfile to include gfx950 in PYTORCH_ROCM_ARCH. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/target/codegen_hip.cc`:
- Around line 1226-1227: The generated template instantiation incorrectly passes
panel_size to tl::<func_name> when the helper templates expect template
parameter panel_width; update the code in src/target/codegen_hip.cc where you
emit "tl::" << func_name << "<" << panel_size << ">()" to instead pass the
panel_width template argument (e.g., use an existing panel_width variable or
compute panel_width = panel_size / gridDim.x before emission) so the emitted
instantiation matches the tl_templates/hip/threadblock_swizzle.h signature
(template<int panel_width>).
In `@tilelang/intrinsics/mfma_macro_generator.py`:
- Around line 331-332: The code currently collapses leading BufferRegion axes by
taking only r.min (e.g. A_other = [r.min for r in A_region.region[:-2]]), which
silently slices away higher-rank data; change these sites to either assert that
every leading axis has extent == 1 or propagate full indexing information:
replace the direct r.min extraction with a validation step (for each r in
A_region.region[:-2] and B_region.region[:-2] assert r.extent == 1 and
raise/abort if not) or build and pass through per-axis index expressions (keep
r.min and r.extent or construct index vectors) so higher-rank regions are
correctly addressed; apply the same fix to all similar variables and spots
(A_other, B_other and the blocks around the noted regions where r.min is used
for prefix axes).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f10421f2-536d-4c0d-a39f-5c79c5d8cdd3
📒 Files selected for processing (4)
docker/Dockerfile.rocmsrc/target/codegen_hip.ccsrc/tl_templates/hip/common.htilelang/intrinsics/mfma_macro_generator.py
| this->stream << "const dim3 blockIdx = tl::" << func_name << "<" | ||
| << panel_size << ">();\n"; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
hdr="$(fd 'threadblock_swizzle\.h$' | head -n1)"
echo "Inspecting: ${hdr}"
sed -n '1,240p' "${hdr}"
echo
echo "--- template / panel_size matches ---"
rg -n -C2 'template\s*<[^>]*panel_size[^>]*>|panel_size' "${hdr}"Repository: tile-ai/tilelang
Length of output: 8112
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Searching for all threadblock_swizzle.h files ==="
fd 'threadblock_swizzle\.h$'
echo
echo "=== Checking HIP-specific version if exists ==="
if [ -f "src/tl_templates/hip/threadblock_swizzle.h" ]; then
echo "Found HIP version:"
sed -n '1,150p' "src/tl_templates/hip/threadblock_swizzle.h"
echo
echo "--- template / panel matches in HIP ---"
rg -n 'template\s*<[^>]*>|panel' "src/tl_templates/hip/threadblock_swizzle.h" | head -20
else
echo "No HIP-specific version found"
fi
echo
echo "=== Checking what func_name could be in codegen_hip.cc ==="
rg -B5 -A5 'threadblock_swizzle|func_name.*swizzle|rasterization2D' src/target/codegen_hip.cc | head -50Repository: tile-ai/tilelang
Length of output: 5040
Template parameter mismatch: panel_size passed but panel_width expected.
Line 1226–1227 generates tl::<func_name><panel_size>(), but the HIP threadblock swizzle helpers in src/tl_templates/hip/threadblock_swizzle.h declare templates with template <int panel_width>. Inside these templates, panel_size is computed as a local variable (e.g., panel_size = panel_width * gridDim.x), not a template parameter. The generated code passes the computed panel_size directly as a template argument, which does not match the declared panel_width parameter. This causes a compile-time error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/target/codegen_hip.cc` around lines 1226 - 1227, The generated template
instantiation incorrectly passes panel_size to tl::<func_name> when the helper
templates expect template parameter panel_width; update the code in
src/target/codegen_hip.cc where you emit "tl::" << func_name << "<" <<
panel_size << ">()" to instead pass the panel_width template argument (e.g., use
an existing panel_width variable or compute panel_width = panel_size / gridDim.x
before emission) so the emitted instantiation matches the
tl_templates/hip/threadblock_swizzle.h signature (template<int panel_width>).
| # Leading dimensions (e.g. pipeline stage axis) – empty for 2-D buffers | ||
| A_other = [r.min for r in A_region.region[:-2]] |
There was a problem hiding this comment.
Make the singleton-prefix assumption explicit.
Line 332 and Line 376 only keep r.min for the leading BufferRegion axes. If any of those extents is > 1, these loads silently collapse the region to its first slice. Please either assert that all prefix extents are singleton here, or plumb the extra indices through so higher-rank regions are actually addressed.
Also applies to: 348-354, 375-376, 395-405
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 331-331: Comment contains ambiguous – (EN DASH). Did you mean - (HYPHEN-MINUS)?
(RUF003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tilelang/intrinsics/mfma_macro_generator.py` around lines 331 - 332, The code
currently collapses leading BufferRegion axes by taking only r.min (e.g. A_other
= [r.min for r in A_region.region[:-2]]), which silently slices away higher-rank
data; change these sites to either assert that every leading axis has extent ==
1 or propagate full indexing information: replace the direct r.min extraction
with a validation step (for each r in A_region.region[:-2] and
B_region.region[:-2] assert r.extent == 1 and raise/abort if not) or build and
pass through per-axis index expressions (keep r.min and r.extent or construct
index vectors) so higher-rank regions are correctly addressed; apply the same
fix to all similar variables and spots (A_other, B_other and the blocks around
the noted regions where r.min is used for prefix axes).
|
approved and some rocm ci related issues was not introduced by this pr and will be fixed in #2033 |
* add MI355 support and fix some rocm releated issues * add MI355 support and fix some rocm releated issues * update * add gfx950 support for deepgemm example * lint fix --------- Co-authored-by: LeiWang1999 <leiwang1999@outlook.com>
HI @LeiWang1999
This PR is used to add gfx950 support for Deepgeem example, extends HIP code generation to support wider MFMA accumulator register types required by gfx950 ,and includes the below changes:
Thanks
Summary by CodeRabbit
New Features
Improvements