[codex] add test6 validshape tpush/tpop sample#476
[codex] add test6 validshape tpush/tpop sample#476zhangstevenunity wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new test case for matrix multiplication using TPush and TPop operations between cube and vector kernels. A potential logic error was identified in the cube kernel where the accumulator tile is allocated outside the loop; because pto.set_validshape modifies tile metadata in-place, subsequent loop iterations would incorrectly perform partial multiplications. It is recommended to move the tile allocation inside the loop to ensure each iteration starts with the correct dimensions.
| %acc_tile = pto.alloc_tile valid_row = %c16 valid_col = %c16 | ||
| : !pto.tile_buf<loc=acc, dtype=f32, rows=16, cols=16, v_row=?, v_col=?, blayout=col_major, slayout=row_major, fractal=1024, pad=0> |
There was a problem hiding this comment.
The accumulator tile %acc_tile is allocated outside the loop, but its valid shape is modified inside the loop using pto.set_validshape (line 54). Since set_validshape updates the tile metadata in-place, subsequent iterations of the loop will start with the modified valid shape (8x16 instead of 16x16). This will likely cause the pto.tmatmul in iterations 2-4 to only compute a partial result (8x16) instead of the full 16x16 multiplication performed in the first iteration. To ensure each iteration computes a full 16x16 result before shrinking the valid shape for pushing, the alloc_tile for %acc_tile should be moved inside the loop, similar to the implementation in test/samples/TPushTPop/test4/kernel.pto.
What changed
test/samples/TPushTPop/test6/kernel.ptoTPushTPop/test2A5 case structurepto.set_validshape %acc_tile, %c8, %c16afterpto.tmatmul4x16tile withsplit = 1Why
This adds a dedicated sample for the flow you requested: after updating the ACC tile valid shape, pop the result from the vector kernel.
Validation
ptoas/home/rdp/hw-native-sys/PTOAS/build/tools/ptoas/ptoas --pto-arch=a5 --enable-insert-sync -o /tmp/ptoas-test6/kernel.cpp /mnt/c/Users/rdp/Documents/ptoas/test/samples/TPushTPop/test6/kernel.ptoSetValidShape(...)Tile<TileType::Vec, float, 4, 16, ...>TPOP<..., Tile<TileType::Vec, float, 4, 16, ...>, TileSplitAxis::TILE_UP_DOWN>Notes
test/samples/TPushTPop/test2/kernel.pto; those were intentionally excluded from this PR