Skip to content

matmul_all_reduce: no validation that lock array is large enough for current tile count #463

@aamarnat

Description

@aamarnat

Bug

In iris/ops/matmul_all_reduce.py, the lock array is pre-allocated with a size based on block dimensions at allocation time. If a subsequent call uses smaller block sizes (producing more tiles), the kernel writes lock entries beyond the lock array bounds. On MI355X this silently corrupts adjacent symmetric heap objects, leading to non-deterministic wrong results on later calls.

Example: Allocating with bm=64 for M=2048 gives 32 * 23 = 736 tiles. A later call with bm=32 needs 64 * 23 = 1472 lock entries — writing 736 entries past the end of the array.

Impact

Silent symmetric heap corruption. No error raised. Wrong numerical results on subsequent kernel calls.

Fix

Add a runtime check in matmul_all_reduce():

if workspace.locks is not None and workspace.locks.numel() < total_tiles:
    raise ValueError(
        f"Lock array too small: have {workspace.locks.numel()} but need {total_tiles}. "
        f"Pre-allocate workspace with the smallest block sizes you intend to use."
    )

Component

iris/ops/matmul_all_reduce.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingirisIris project issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions