Skip to content

[codex] add test6 validshape tpush/tpop sample#476

Draft
zhangstevenunity wants to merge 1 commit intomainfrom
codex/add-test6-validshape-pop
Draft

[codex] add test6 validshape tpush/tpop sample#476
zhangstevenunity wants to merge 1 commit intomainfrom
codex/add-test6-validshape-pop

Conversation

@zhangstevenunity
Copy link
Copy Markdown
Collaborator

What changed

  • add test/samples/TPushTPop/test6/kernel.pto
  • base the new sample on the existing TPushTPop/test2 A5 case structure
  • make the cube-side accumulator tile dynamic in valid shape
  • call pto.set_validshape %acc_tile, %c8, %c16 after pto.tmatmul
  • pop the result on the vector side as a 4x16 tile with split = 1

Why

This adds a dedicated sample for the flow you requested: after updating the ACC tile valid shape, pop the result from the vector kernel.

Validation

  • compiled with an existing local WSL ptoas
  • command used:
    • /home/rdp/hw-native-sys/PTOAS/build/tools/ptoas/ptoas --pto-arch=a5 --enable-insert-sync -o /tmp/ptoas-test6/kernel.cpp /mnt/c/Users/rdp/Documents/ptoas/test/samples/TPushTPop/test6/kernel.pto
  • confirmed generated output contains:
    • SetValidShape(...)
    • Tile<TileType::Vec, float, 4, 16, ...>
    • TPOP<..., Tile<TileType::Vec, float, 4, 16, ...>, TileSplitAxis::TILE_UP_DOWN>

Notes

  • the local worktree also contains unrelated edits under test/samples/TPushTPop/test2/kernel.pto; those were intentionally excluded from this PR

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new test case for matrix multiplication using TPush and TPop operations between cube and vector kernels. A potential logic error was identified in the cube kernel where the accumulator tile is allocated outside the loop; because pto.set_validshape modifies tile metadata in-place, subsequent loop iterations would incorrectly perform partial multiplications. It is recommended to move the tile allocation inside the loop to ensure each iteration starts with the correct dimensions.

Comment on lines +37 to +38
%acc_tile = pto.alloc_tile valid_row = %c16 valid_col = %c16
: !pto.tile_buf<loc=acc, dtype=f32, rows=16, cols=16, v_row=?, v_col=?, blayout=col_major, slayout=row_major, fractal=1024, pad=0>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The accumulator tile %acc_tile is allocated outside the loop, but its valid shape is modified inside the loop using pto.set_validshape (line 54). Since set_validshape updates the tile metadata in-place, subsequent iterations of the loop will start with the modified valid shape (8x16 instead of 16x16). This will likely cause the pto.tmatmul in iterations 2-4 to only compute a partial result (8x16) instead of the full 16x16 multiplication performed in the first iteration. To ensure each iteration computes a full 16x16 result before shrinking the valid shape for pushing, the alloc_tile for %acc_tile should be moved inside the loop, similar to the implementation in test/samples/TPushTPop/test4/kernel.pto.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant