fix: fall back illegal A5 auto-sync pairs to barrier#482
fix: fall back illegal A5 auto-sync pairs to barrier#482HecreReed wants to merge 2 commits intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a fallback synchronization mechanism for the A5 architecture, replacing specific flag-based synchronization with a global pipe barrier (PIPE_ALL) when low-level pipes are illegal. This change affects both single-buffer and multi-buffer synchronization generation. Feedback indicates that the fallback barrier creation in CreateSetWaitOpForMultiBuffer lacks deduplication logic, which could lead to redundant barrier operations.
| if (shouldUseA5BarrierFallback(func_, sync)) { | ||
| auto pipeAllAttr = getPipeAttr(rewriter, PipelineType::PIPE_ALL); | ||
| if (beforeInsert || op->hasTrait<OpTrait::IsTerminator>()) { | ||
| rewriter.setInsertionPoint(op); | ||
| } else { | ||
| rewriter.setInsertionPointAfter(op); | ||
| } | ||
| rewriter.create<pto::BarrierOp>(op->getLoc(), pipeAllAttr); | ||
| return; | ||
| } |
There was a problem hiding this comment.
The fallback barrier creation in CreateSetWaitOpForMultiBuffer also lacks deduplication logic. While the insertion point logic cannot be easily moved to the top due to the GetBufferSelected call later in the function, the barrier creation itself should still check for existing identical barriers to avoid redundancy.
Codex Review该评论由 review 机器人自动更新。
Summary发现 1 个 P1:PR 改变了 A5 auto-sync 的输出形态(部分 set/wait 改为 PIPE_ALL barrier),但未同步现有 basic FileCheck 断言,极可能导致 CI 失败。 Findings
|
|
/run a5 qwen3_decode_layer_incore_0,qwen3_decode_layer_incore_1,qwen3_decode_layer_incore_2,qwen3_decode_layer_incore_3,qwen3_decode_layer_incore_4,qwen3_decode_layer_incore_5,qwen3_decode_layer_incore_6,qwen3_decode_layer_incore_7,qwen3_decode_layer_incore_8,qwen3_decode_layer_incore_9,qwen3_decode_layer_incore_10,qwen3_decode_layer_incore_11,qwen3_decode_layer_incore_12,qwen3_decode_layer_incore_13,qwen3_decode_layer_incore_14,qwen3_decode_layer_incore_15,qwen3_decode_layer_incore_16,qwen3_decode_layer_incore_17,qwen3_decode_layer_incore_18,qwen3_decode_layer_incore_19 --pto-level=level3 |
A5 板测失败
失败用例
|
A5 板测失败详情:PR #482qwen3_decode_layer_incore_8
qwen3_decode_layer_incore_6
qwen3_decode_layer_incore_5
qwen3_decode_layer_incore_4
qwen3_decode_layer_incore_3
qwen3_decode_layer_incore_17
|
Summary
pto.barrier <PIPE_ALL>when either pipe is not legal for A5 low-levelset_flag/wait_flagRoot Cause
PR426 exposed an existing A5 sync codegen bug rather than introducing it.
For auto-inserted sync, PTOAS could emit low-level pairs like:
PIPE_MTE2 -> PIPE_MTE1PIPE_MTE1 -> PIPE_MPIPE_M -> PIPE_FIXAscend950/A5 rejects those pipe ids in low-level
set_flag/wait_flag, so generated C++ failed in A5 SIM compile.Scope
This patch is intentionally narrow:
SyncCodegenis changedPIPE_ALLbarrierValidation
ptoas --pto-arch a5 --enable-insert-sync test/basic/tmov_acc_mat_pipe_selection.pto | FileCheck ...ptoas --pto-arch a5 --enable-insert-sync test/basic/tinsert_a5_pipe_selection.pto | FileCheck ...ptoas --pto-arch a5 --enable-insert-sync test/basic/tmov_acc_to_vec_mode_a5_emitc.pto | FileCheck --check-prefix=A5 ...ptoas --pto-arch a3 test/basic/tmov_acc_to_vec_mode_a5_emitc.pto | FileCheck --check-prefix=A3 ...ptoas --pto-arch a5 --enable-insert-sync test/basic/textract_a5_scaling_pipe_selection.pto | FileCheck ...qwen3_decode_layer_incore_{3,4,5,8}.ptowith--pto-level=level3 --enable-insert-sync; illegal A5 pairs no longer appear in generated C++