[TLERaw] Remove redundant copy && Support buffered_tensor with tle_raw.call#479
Merged
sgjzfzzf merged 9 commits intoflagos-ai:triton_v3.6.xfrom Apr 13, 2026
Merged
Conversation
4c09ec7 to
841ac31
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates the TLE Raw pipeline to (1) support passing ttg::MemDescType/buffered shared-memory tensors through tle_raw calls and (2) eliminate redundant shared-memory copies by rewriting specific scf.for loop argument patterns.
Changes:
- Extend TLERaw protocol flatten/pack logic to handle
ttg::MemDescTypeand add aMemDescPatternto signatures. - Add and wire a new
tle-remove-redundant-copyMLIR pass (plus small SCF utility helpers) into the NVIDIA backend compilation pipeline. - Add Python
tle_raw.call_smem()and extendtle_gpu.alloc()with aninit_valueto support initialized shared-memory buffers; update tutorials accordingly.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| third_party/tle/utils/lib/Protocol.cpp | Add MemDesc signature handling and allow packing memdesc returns in LLVMStructurePattern. |
| third_party/tle/utils/include/Protocol.h | Expose MemDescPattern and include TritonGPU dialect types for memdesc support. |
| third_party/tle/triton_tle.cc | Register the new raw pass wrapper add_tle_remove_redundant_copy. |
| third_party/tle/dialect/lib/Transforms/TleUtility.cpp | Add SCF helper utilities used by multiple transforms. |
| third_party/tle/dialect/lib/Transforms/RemoveRedundantCopy.cpp | New transform pass to rewrite loop-carried args to avoid redundant local copies. |
| third_party/tle/dialect/lib/Transforms/ConvertArgToMemDesc.cpp | Adjust arg-to-memdesc conversion behavior in single-loop/iter-arg cases and barrier insertion conditions. |
| third_party/tle/dialect/lib/Transforms/CMakeLists.txt | Build integration for the new transform sources. |
| third_party/tle/dialect/include/Transforms/TleUtility.h | Header for SCF helper utilities. |
| third_party/tle/dialect/include/Transforms/RemoveRedundantCopy.h | Header for new redundant-copy removal patterns. |
| third_party/tle/dialect/include/Transforms/Passes.td | Add tle-remove-redundant-copy pass definition. |
| third_party/nvidia/backend/compiler.py | Insert the new raw pass into the CUDA TTGIR pipeline. |
| python/triton/experimental/tle/language/raw/core.py | Add call_smem() returning buffered_tensor outputs. |
| python/triton/experimental/tle/language/raw/init.py | Export call_smem. |
| python/triton/experimental/tle/language/gpu/core.py | Add init_value support to tle_gpu.alloc() for initialized smem buffers. |
| python/tutorials/tle/raw/mlir/05-topk.py | Switch to initialized smem buffers + call_smem usage for outputs/inputs. |
| python/tutorials/tle/raw/mlir/03-matrix-multiplication-smem.py | New MLIR tutorial example using call_smem and buffered shared memory. |
| python/tutorials/tle/raw/cuda/03-matrix-multiplication-smem.py | New CUDA tutorial example using call_smem and buffered shared memory. |
sgjzfzzf
previously approved these changes
Apr 9, 2026
sgjzfzzf
approved these changes
Apr 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[TLERaw]
tle_raw.call().tle_raw.call_smem().[Bug]
tle_raw.call_smem(), you need to comment out thistle.passes.add_assign_local_pointers_encoding(pm)statement; otherwise, occasional compilation errors may occur.[Fixed]