-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Labels
Description
Bug
In iris/ops/matmul_all_reduce.py, the lock array is pre-allocated with a size based on block dimensions at allocation time. If a subsequent call uses smaller block sizes (producing more tiles), the kernel writes lock entries beyond the lock array bounds. On MI355X this silently corrupts adjacent symmetric heap objects, leading to non-deterministic wrong results on later calls.
Example: Allocating with bm=64 for M=2048 gives 32 * 23 = 736 tiles. A later call with bm=32 needs 64 * 23 = 1472 lock entries — writing 736 entries past the end of the array.
Impact
Silent symmetric heap corruption. No error raised. Wrong numerical results on subsequent kernel calls.
Fix
Add a runtime check in matmul_all_reduce():
if workspace.locks is not None and workspace.locks.numel() < total_tiles:
raise ValueError(
f"Lock array too small: have {workspace.locks.numel()} but need {total_tiles}. "
f"Pre-allocate workspace with the smallest block sizes you intend to use."
)Component
iris/ops/matmul_all_reduce.py
Reactions are currently unavailable