[codex] add compilable a3 test3 validshape tpush/tpop sample by zhangstevenunity · Pull Request #475 · hw-native-sys/PTOAS

zhangstevenunity · 2026-04-13T09:06:05Z

What changed

recreate test/samples/TPushTPop/a3/test3/kernel.pto using the same PTO syntax/style as the existing a3/test1 and a3/test2 samples
keep the requested semantics: pto.tmatmul writes into an ACC tile, pto.set_validshape changes it to (8, 16), and the vector side pops 4x16 tiles
keep the looped cube->vector handoff structure so the sample stays aligned with the existing TPushTPop/a3/test2 pattern

Why

The first PR carried the intended logic, but the file shape/syntax did not match the current A3 sample style closely enough and failed parsing with the available local ptoas.

Validation

compiled the updated sample with an existing local WSL ptoas 0.23
command used:
- /home/rdp/hw-native-sys/PTOAS/build/tools/ptoas/ptoas --pto-arch=a3 --enable-insert-sync -o /tmp/ptoas-a3-test3/kernel.cpp /mnt/c/Users/rdp/Documents/ptoas/test/samples/TPushTPop/a3/test3/kernel.pto
confirmed generated C++ contains:
- SetValidShape(...)
- Tile<TileType::Vec, float, 4, 16, ...>
- TPOP<..., Tile<TileType::Vec, float, 4, 16, ...>>

Notes

rebuilding the current workspace ptoas under build-wsl-pr443 is presently blocked by an unrelated compile error in lib/PTO/IR/PTO.cpp (verifyTileBufSameShapeAndElem missing), so sample validation used an already-built local ptoas binary instead

gemini-code-assist

Code Review

This pull request introduces a new test case for matrix multiplication using TPush and TPop operations between cube and vector kernels. Several issues were identified regarding the data flow and logic: the vector kernel's loop count and tile shapes do not align with the data produced by the cube kernel, and the use of tfree_from_aic may lead to data loss. Additionally, there are opportunities to optimize the vector kernel by removing redundant tile allocations and move operations. Finally, the placement of set_validshape within the cube kernel loop may cause unintended side effects on subsequent matrix multiplications due to in-place metadata updates.

gemini-code-assist · 2026-04-13T09:07:47Z

test/samples/TPushTPop/a3/test3/kernel.pto

+    scf.for %i = %c0 to %c4 step %c1 {
+      %fifo_tile = pto.tpop_from_aic {split = 0}
+        -> !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>
+      pto.tmov ins(%fifo_tile : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>) outs(%vec_print : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
+      pto.tprint ins(%vec_print : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
+      pto.tfree_from_aic {split = 0}
+    }


There is a significant data mismatch and potential logic error in the vector kernel loop:

Loop Count Mismatch: The cube kernel pushes 4 tiles of shape 8x16 (total 32 rows). The vector kernel loop only runs 4 times (%c4) and pops 4x16 tiles (total 16 rows). To consume all pushed data, the vector loop should run 8 times (%c8).

Shape and Split Mismatch: The cube pushes 8x16 tiles with {split = 0}, while the vector pops 4x16 tiles with {split = 0}. Typically, {split = 0} (no split) requires exact shape matching between the producer and consumer. If splitting is intended, the split attribute should likely be non-zero (e.g., split = 1 for height split) to allow popping sub-tiles from a larger pushed slot.

Premature Slot Release: pto.tfree_from_aic is called in every iteration. If one push (8x16) is intended to be consumed by two pops (4x16), calling tfree after the first pop will release the FIFO slot and discard the remaining data.

gemini-code-assist · 2026-04-13T09:07:47Z

test/samples/TPushTPop/a3/test3/kernel.pto

+      pto.tmov ins(%fifo_tile : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>) outs(%vec_print : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
+      pto.tprint ins(%vec_print : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)


The pto.tmov operation is redundant. %fifo_tile is already a tile buffer in the vec location with the required shape and type. You can pass %fifo_tile directly to pto.tprint and remove this move, as well as the unnecessary %vec_print allocation at line 83.

pto.tprint ins(%fifo_tile : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)

gemini-code-assist · 2026-04-13T09:07:47Z

test/samples/TPushTPop/a3/test3/kernel.pto

+      pto.set_validshape %acc_tile, %c8, %c16
+        : !pto.tile_buf<loc=acc, dtype=f32, rows=16, cols=16, v_row=?, v_col=?, blayout=col_major, slayout=row_major, fractal=1024, pad=0>


The placement of pto.set_validshape inside the loop causes an inconsistency between iterations. %acc_tile is initialized with a 16x16 valid shape (line 42). In the first iteration, pto.tmatmul (line 58) operates on a 16x16 region. However, set_validshape updates the tile handle metadata in-place to 8x16, meaning all subsequent tmatmul calls in the loop will see and potentially respect the restricted 8x16 valid shape. If tmatmul is intended to always compute a full 16x16 result, consider moving set_validshape before the loop or resetting the shape after the push.

fix a3 test3 PTO syntax for validshape tpush tpop sample

3defe6c

zhangstevenunity mentioned this pull request Apr 13, 2026

[codex] add a3 test3 validshape tpush/tpop sample #472

Draft

gemini-code-assist bot reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] add compilable a3 test3 validshape tpush/tpop sample#475

[codex] add compilable a3 test3 validshape tpush/tpop sample#475
zhangstevenunity wants to merge 1 commit intomainfrom
codex/add-compilable-a3-test3-validshape-pop

zhangstevenunity commented Apr 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		pto.tmov ins(%fifo_tile : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>) outs(%vec_print : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)
		pto.tprint ins(%vec_print : !pto.tile_buf<loc=vec, dtype=f32, rows=4, cols=16, v_row=4, v_col=16, blayout=row_major, slayout=none_box, fractal=512, pad=0>)

		pto.set_validshape %acc_tile, %c8, %c16
		: !pto.tile_buf<loc=acc, dtype=f32, rows=16, cols=16, v_row=?, v_col=?, blayout=col_major, slayout=row_major, fractal=1024, pad=0>

Conversation

zhangstevenunity commented Apr 13, 2026

What changed

Why

Validation

Notes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant