[FlyDSL]add FlyDSL MoE sorting kernel#3266
Draft
amd-weisun wants to merge 4 commits into
Draft
Conversation
Contributor
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
coderfeli
reviewed
May 20, 2026
| # =================== MOE_BUF ZEROING (blocks > 0 only) =============== | ||
| is_zero_block = bid != c_zero_i32 | ||
| _if_zero = scf.IfOp(is_zero_block.ir_value()) | ||
| with _if_then(_if_zero): |
Collaborator
There was a problem hiding this comment.
we can use python if directly here?
coderfeli
reviewed
May 20, 2026
| mesh_addr = token_id * c_smem_cols + eid | ||
| last_mesh_idx = fx.Int32(sub_tokens * smem_cols - 1) | ||
| safe_mesh_addr = is_valid.select(mesh_addr, last_mesh_idx) | ||
| safe_mesh_ix = ArithValue(safe_mesh_addr).index_cast(T.index) |
Collaborator
There was a problem hiding this comment.
use Fx.int64() directly
coderfeli
reviewed
May 20, 2026
| p0v2_allocator.ptr = p0v2_reduce_offset + P0V2_NUM_WAVES * 4 | ||
|
|
||
| @flyc.kernel(known_block_size=[P0V2_BLOCK, 1, 1]) | ||
| def p0v2_kernel( |
Collaborator
There was a problem hiding this comment.
Too many duplicated codes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Add FlyDSL MoE sorting kernel.
The kernel implementation is from FlyDSL PR ROCm/FlyDSL#540
This PR includes the integration codes.
FlyDSL MoE can be enabled by
AITER_USE_FLYDSL_MOE_SORTING=1, following the style of CK kernel :AITER_USE_CK_MOE_SORTING=1. The default kernel now is OPUS.E2E Benchmark Result
DeepSeek-R1-0528 FP8 TP8
Accuracy
DeepSeek-R1-0528 MXFP4 TP8
Accuracy