[codex] add a3 test3 validshape tpush/tpop sample by zhangstevenunity · Pull Request #472 · hw-native-sys/PTOAS

zhangstevenunity · 2026-04-13T07:52:18Z

What changed

add a new sample at test/samples/TPushTPop/a3/test3/kernel.pto
keep the cube/vector handoff structure under TPushTPop
update the sample so the accumulator tile uses dynamic valid shape metadata
set %acc_tile to valid shape (8, 16) after pto.tmatmul
pop the result on the vector side as a 4x16 tile and print it

Why

This adds the exact A3 sample variant requested for the tmatmul -> set_validshape -> vec pop flow, but under a3/test3 instead of a3/test2.

Impact

provides a focused regression/sample case for changing acc valid shape before tpush/tpop
demonstrates consuming an 8x16 valid accumulator tile from the vector side in 4x16 chunks

Validation

local file creation and local commit completed successfully
I did not run ptoas validation in this environment because a runnable ptoas binary was not available in the workspace
direct shell push to GitHub was blocked by local network access to github.com:443, so the branch/file/PR were created through the GitHub connector instead

gemini-code-assist

Code Review

This pull request introduces a new PTO kernel file to test TPush and TPop operations between cube and vector cores. However, the implementation contains several critical issues. There is a logical mismatch between the data produced in @cube_func and consumed in @vec_func, resulting in data being discarded prematurely due to incorrect tile sizes and improper use of tfree_from_aic. Additionally, multiple operations—including aic_initialize_pipe, reserve_buffer, tload, tmov, and various pipe operations—violate the PTO dialect definitions specified in PTOOps.td regarding required operands, return types, and attribute formats.

gemini-code-assist · 2026-04-13T07:53:49Z

test/samples/TPushTPop/a3/test3/kernel.pto

+    scf.for %i = %c0 to %c4 step %c1 {
+      %vec_tile = pto.tpop_from_aic ins(%vec_l0c : !pto.async_buffer<core="vector", direction="in", slots=4, slot_size=1024>) { split = 0 : index } : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>
+      %vec_print = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>
+      pto.tmov ins(%vec_tile : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>) outs(%vec_print : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
+      pto.tprint ins(%vec_print : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
+      pto.tfree_from_aic ins(%vec_l0c : !pto.async_buffer<core="vector", direction="in", slots=4, slot_size=1024>)
+    }


There is a logical mismatch between the data produced on the cube side and consumed on the vector side. The cube side pushes 4 tiles of 8x16 (Line 24). The vector side pops 4 times, but each pop is only 4x16 (Line 36). Furthermore, tfree_from_aic is called in every iteration (Line 40), which releases the entire slot. This means half of the data in each pushed slot is discarded. To process all data in 4x16 chunks as described in the PR, the vector loop should run 8 times, and tfree should only be called after every two pops.

gemini-code-assist · 2026-04-13T07:53:49Z

test/samples/TPushTPop/a3/test3/kernel.pto

+    pto.aic_initialize_pipe 4, 1
+    %left_l1 = pto.reserve_buffer "cube" [1] { slot_size = 256 } : !pto.async_buffer<core="cube", direction="in", slots=1, slot_size=256>
+    %right_l1 = pto.reserve_buffer "cube" [1] { slot_size = 256 } : !pto.async_buffer<core="cube", direction="in", slots=1, slot_size=256>
+    %acc_l0c = pto.reserve_buffer "cube" [4] { slot_size = 1024 } : !pto.async_buffer<core="cube", direction="out", slots=4, slot_size=1024>


The operations pto.aic_initialize_pipe and pto.reserve_buffer do not follow the definitions in PTOOps.td. aic_initialize_pipe is missing the attribute dictionary braces and the required operands (gm_slot_buffer, c2v_consumer_buf, v2c_consumer_buf). reserve_buffer uses an incorrect syntax and returns a !pto.async_buffer type, whereas the ODS specifies it returns an i32 address.

gemini-code-assist · 2026-04-13T07:53:49Z

test/samples/TPushTPop/a3/test3/kernel.pto

+      %left_tile = pto.tload ins(%left[%c0] : memref<64xf32, #pto.address_space<gm>>) : !pto.tile_buf<loc=mat, dtype=f32, rows=16, cols=16, v_row=16, v_col=4, blayout=row_major, slayout=none_box, fractal=512, pad=0>
+      %offset = arith.muli %i, %c64 : index
+      %right_tile = pto.tload ins(%right[%offset] : memref<256xf32, #pto.address_space<gm>>) : !pto.tile_buf<loc=mat, dtype=f32, rows=16, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>
+      %left_tile_l0a = pto.tmov ins(%left_tile : !pto.tile_buf<loc=mat, dtype=f32, rows=16, cols=16, v_row=16, v_col=4, blayout=row_major, slayout=none_box, fractal=512, pad=0>) outs(!pto.tile_buf<loc=left, dtype=f32, rows=16, cols=16, v_row=16, v_col=4, blayout=row_major, slayout=row_major, fractal=512, pad=0>)
+      %right_tile_l0b = pto.tmov ins(%right_tile : !pto.tile_buf<loc=mat, dtype=f32, rows=16, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>) outs(!pto.tile_buf<loc=right, dtype=f32, rows=16, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=col_major, fractal=512, pad=0>)


Several operations use an incorrect syntax relative to the PTO dialect definition:

pto.tload (Lines 16, 18) is used in a functional style, but it is defined as a Destination-Passing Style (DPS) operation in PTOOps.td requiring an outs operand.

pto.tmov (Lines 19, 20) provides a type in the outs clause instead of a destination operand.

gemini-code-assist · 2026-04-13T07:53:49Z

test/samples/TPushTPop/a3/test3/kernel.pto

+      pto.tpush_to_aiv ins(%acc_tile : !pto.tile_buf<loc=acc, dtype=f32, rows=16, cols=16, v_row=?, v_col=?, blayout=col_major, slayout=row_major, fractal=1024, pad=0>) outs(%acc_l0c : !pto.async_buffer<core="cube", direction="out", slots=4, slot_size=1024>) { split = 0 : index }
+    }
+    return
+  }
+
+  func.func @vec_func() {
+    %c0 = arith.constant 0 : index
+    %c1 = arith.constant 1 : index
+    %c4 = arith.constant 4 : index
+    pto.aiv_initialize_pipe 1, 4
+    %vec_l0c = pto.reserve_buffer "vector" [4] { slot_size = 1024 } : !pto.async_buffer<core="vector", direction="in", slots=4, slot_size=1024>
+    scf.for %i = %c0 to %c4 step %c1 {
+      %vec_tile = pto.tpop_from_aic ins(%vec_l0c : !pto.async_buffer<core="vector", direction="in", slots=4, slot_size=1024>) { split = 0 : index } : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>
+      %vec_print = pto.alloc_tile : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>
+      pto.tmov ins(%vec_tile : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>) outs(%vec_print : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
+      pto.tprint ins(%vec_print : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
+      pto.tfree_from_aic ins(%vec_l0c : !pto.async_buffer<core="vector", direction="in", slots=4, slot_size=1024>)


The frontend pipe operations do not match the ODS:

pto.tpush_to_aiv (Line 24) and pto.tfree_from_aic (Line 40) should not have an outs or ins operand respectively.

pto.tpop_from_aic (Line 36) should not have an ins operand.

The split attribute in these operations should be an i8 integer attribute (e.g., 0 : i8), not an index.

zhangstevenunity · 2026-04-13T07:56:43Z

/run a3 test/samples/TPushTPop/a3/test3/kernel.pto

reedhecre · 2026-04-13T08:19:59Z

A3 板测失败

触发方式：manual
源码提交：f0607284e26e
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260413_161904_manual_pr472.log
手动指令：/run a3 test/samples/TPushTPop/a3/test3/kernel
触发人：zhangstevenunity
指定用例：test/samples/TPushTPop/a3/test3/kernel
触发评论：[codex] add a3 test3 validshape tpush/tpop sample #472 (comment)
失败阶段：build-ptoas / exit=1

日志尾部

ransforms/PTOToEmitC.cpp: At global scope:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_161904_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:6743:20: warning: ‘std::string maskPatternTok(mlir::pto::MaskPatternAttr)’ defined but not used [-Wunused-function]
 6743 | static std::string maskPatternTok(mlir::pto::MaskPatternAttr a) {
      |                    ^~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_161904_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:4190:20: warning: ‘std::string getPipeName(mlir::pto::PIPE)’ defined but not used [-Wunused-function]
 4190 | static std::string getPipeName(pto::PIPE pipe) {
      |                    ^~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_161904_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:2601:13: warning: ‘Role inferSubviewRole(mlir::memref::SubViewOp)’ defined but not used [-Wunused-function]
 2601 | static Role inferSubviewRole(memref::SubViewOp sv) {
      |             ^~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_161904_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:2461:13: warning: ‘void inferTileMNK(mlir::func::FuncOp, int&, int&, int&)’ defined but not used [-Wunused-function]
 2461 | static void inferTileMNK(func::FuncOp f, int &M, int &N, int &K) {
      |             ^~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_161904_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:2448:19: warning: ‘KernelKind inferKernelKind(mlir::func::FuncOp)’ defined but not used [-Wunused-function]
 2448 | static KernelKind inferKernelKind(func::FuncOp f) {
      |                   ^~~~~~~~~~~~~~~
ninja: build stopped: subcommand failed.
===== END STAGE build-ptoas rc=1 @ 2026-04-13 16:19:57 =====

zhangstevenunity · 2026-04-13T08:45:29Z

/run a3 test/samples/TPushTPop/a3/test3/kernel

reedhecre · 2026-04-13T08:47:01Z

A3 板测失败

触发方式：manual
源码提交：f0607284e26e
结果汇总：OK 0 / FAIL 0 / SKIP 0
日志：/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260413_164604_manual_pr472.log
手动指令：/run a3 test/samples/TPushTPop/a3/test3/kernel
触发人：zhangstevenunity
指定用例：test/samples/TPushTPop/a3/test3/kernel
触发评论：[codex] add a3 test3 validshape tpush/tpop sample #472 (comment)
失败阶段：build-ptoas / exit=1

日志尾部

ransforms/PTOToEmitC.cpp: At global scope:
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_164604_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:6743:20: warning: ‘std::string maskPatternTok(mlir::pto::MaskPatternAttr)’ defined but not used [-Wunused-function]
 6743 | static std::string maskPatternTok(mlir::pto::MaskPatternAttr a) {
      |                    ^~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_164604_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:4190:20: warning: ‘std::string getPipeName(mlir::pto::PIPE)’ defined but not used [-Wunused-function]
 4190 | static std::string getPipeName(pto::PIPE pipe) {
      |                    ^~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_164604_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:2601:13: warning: ‘Role inferSubviewRole(mlir::memref::SubViewOp)’ defined but not used [-Wunused-function]
 2601 | static Role inferSubviewRole(memref::SubViewOp sv) {
      |             ^~~~~~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_164604_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:2461:13: warning: ‘void inferTileMNK(mlir::func::FuncOp, int&, int&, int&)’ defined but not used [-Wunused-function]
 2461 | static void inferTileMNK(func::FuncOp f, int &M, int &N, int &K) {
      |             ^~~~~~~~~~~~
/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260413_164604_manual_pr472/repo/lib/PTO/Transforms/PTOToEmitC.cpp:2448:19: warning: ‘KernelKind inferKernelKind(mlir::func::FuncOp)’ defined but not used [-Wunused-function]
 2448 | static KernelKind inferKernelKind(func::FuncOp f) {
      |                   ^~~~~~~~~~~~~~~
ninja: build stopped: subcommand failed.
===== END STAGE build-ptoas rc=1 @ 2026-04-13 16:47:00 =====

zhangstevenunity · 2026-04-13T09:06:12Z

Superseded by #475.

The new PR carries the PTO syntax rewrite that compiles with the available local ptoas and keeps the requested set_validshape(8,16) -> vector pop 4x16 behavior.

add a3 test3 validshape tpush tpop sample

4058798

gemini-code-assist bot reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] add a3 test3 validshape tpush/tpop sample#472

[codex] add a3 test3 validshape tpush/tpop sample#472
zhangstevenunity wants to merge 1 commit intomainfrom
codex/add-a3-test3-validshape-pop

zhangstevenunity commented Apr 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

zhangstevenunity commented Apr 13, 2026

Uh oh!

reedhecre commented Apr 13, 2026

Uh oh!

zhangstevenunity commented Apr 13, 2026

Uh oh!

reedhecre commented Apr 13, 2026

Uh oh!

zhangstevenunity commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhangstevenunity commented Apr 13, 2026

What changed

Why

Impact

Validation

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

zhangstevenunity commented Apr 13, 2026

Uh oh!

reedhecre commented Apr 13, 2026

A3 板测失败

日志尾部

Uh oh!

zhangstevenunity commented Apr 13, 2026

Uh oh!

reedhecre commented Apr 13, 2026

A3 板测失败

日志尾部

Uh oh!

zhangstevenunity commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants