[codex] add test6 validshape tpush/tpop sample by zhangstevenunity · Pull Request #476 · hw-native-sys/PTOAS

zhangstevenunity · 2026-04-13T09:52:15Z

What changed

add test/samples/TPushTPop/test6/kernel.pto
base the new sample on the existing TPushTPop/test2 A5 case structure
make the cube-side accumulator tile dynamic in valid shape
call pto.set_validshape %acc_tile, %c8, %c16 after pto.tmatmul
pop the result on the vector side as a 4x16 tile with split = 1

Why

This adds a dedicated sample for the flow you requested: after updating the ACC tile valid shape, pop the result from the vector kernel.

Validation

compiled with an existing local WSL ptoas
command used:
- /home/rdp/hw-native-sys/PTOAS/build/tools/ptoas/ptoas --pto-arch=a5 --enable-insert-sync -o /tmp/ptoas-test6/kernel.cpp /mnt/c/Users/rdp/Documents/ptoas/test/samples/TPushTPop/test6/kernel.pto
confirmed generated output contains:
- SetValidShape(...)
- Tile<TileType::Vec, float, 4, 16, ...>
- TPOP<..., Tile<TileType::Vec, float, 4, 16, ...>, TileSplitAxis::TILE_UP_DOWN>

Notes

the local worktree also contains unrelated edits under test/samples/TPushTPop/test2/kernel.pto; those were intentionally excluded from this PR

gemini-code-assist

Code Review

This pull request introduces a new test case for matrix multiplication using TPush and TPop operations between cube and vector kernels. A potential logic error was identified in the cube kernel where the accumulator tile is allocated outside the loop; because pto.set_validshape modifies tile metadata in-place, subsequent loop iterations would incorrectly perform partial multiplications. It is recommended to move the tile allocation inside the loop to ensure each iteration starts with the correct dimensions.

gemini-code-assist · 2026-04-13T09:59:17Z

test/samples/TPushTPop/test6/kernel.pto

+    %acc_tile = pto.alloc_tile valid_row = %c16 valid_col = %c16
+      : !pto.tile_buf<loc=acc, dtype=f32, rows=16, cols=16, v_row=?, v_col=?, blayout=col_major, slayout=row_major, fractal=1024, pad=0>


The accumulator tile %acc_tile is allocated outside the loop, but its valid shape is modified inside the loop using pto.set_validshape (line 54). Since set_validshape updates the tile metadata in-place, subsequent iterations of the loop will start with the modified valid shape (8x16 instead of 16x16). This will likely cause the pto.tmatmul in iterations 2-4 to only compute a partial result (8x16) instead of the full 16x16 multiplication performed in the first iteration. To ensure each iteration computes a full 16x16 result before shrinking the valid shape for pushing, the alloc_tile for %acc_tile should be moved inside the loop, similar to the implementation in test/samples/TPushTPop/test4/kernel.pto.

add test6 validshape tpush tpop sample

40545ef

gemini-code-assist bot reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] add test6 validshape tpush/tpop sample#476

[codex] add test6 validshape tpush/tpop sample#476
zhangstevenunity wants to merge 1 commit intomainfrom
codex/add-test6-validshape-pop

zhangstevenunity commented Apr 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		%acc_tile = pto.alloc_tile valid_row = %c16 valid_col = %c16
		: !pto.tile_buf<loc=acc, dtype=f32, rows=16, cols=16, v_row=?, v_col=?, blayout=col_major, slayout=row_major, fractal=1024, pad=0>

Conversation

zhangstevenunity commented Apr 13, 2026

What changed

Why

Validation

Notes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant