Skip to content

Test coverage gaps: recent MIGraphX lowering passes and CI flakiness (Mar 2026) #2297

@cursor

Description

@cursor

Context

This is an automated test-coverage analysis triggered by PR #2295 (marking large-kernel-no-scavenge.mlir as XFAIL due to intermittent lowering failure). While that PR is a CI workaround, it and the surrounding recent merges expose several meaningful test coverage gaps across new lowering passes and bug fixes. Below are the areas most worth hardening, ordered by blast radius.


1. 🔥 arith.maximumf/minimumf — No behavioral test for the "don't expand" pipeline option

Commit: e40a31807f51[EXTERNAL] Stop expanding float min/max ops

What changed: Pipelines.cpp now sets includeFloatMinMax = false so that arith.maximumf / arith.minimumf / arith.maxnumf / arith.minnumf are not expanded into compare-and-select sequences, relying instead on the AMDGPU backend's native v_max_*/v_min_* instructions.

Current test coverage: mlir/test/rocmlir-driver/pipelines.mlir only checks that the printed pipeline string contains include-float-min-max=false. There is no test that:

  • Verifies arith.maximumf is preserved (not expanded) when flowing through the rocMLIR full pipeline.
  • Validates the NaN semantics difference: the old expand path propagates NaN from either operand; the native instruction may have different IEEE handling on specific GFX targets.
  • Checks arith.minnumf / arith.maxnumf (which have "num" semantics — propagate NaN only from lhs).

Risk: If the backend does not handle these ops for a target, compilation fails silently or emits wrong code. NaN-in/NaN-out behavior is a correctness concern for attention masking and fusion kernels.

Suggested tests to add:

File: mlir/test/rocmlir-driver/large-kernel-float-minmax.mlir (new)

// RUN: rocmlir-gen ... | rocmlir-driver --kernel-pipeline=full | FileCheck %s
// CHECK-NOT: arith.cmpf
// CHECK-NOT: arith.select
// CHECK: arith.maximumf

File: mlir/test/Dialect/Arith/expand-ops-amdgpu.mlir (new)

  • A test running arith-expand="include-float-min-max=false" on a function containing all four float min/max ops and verifying they pass through unchanged.
  • A test with a NaN input verifying arith.maximumf(NaN, x) == NaN vs arith.maxnumf(NaN, x) == x when running with include-float-min-max=false.

2. 🔥 Flaky large-kernel-no-scavenge test — Root cause untested

Commit: 5e908079c7daMark large-kernel-no-scavenge as XFAIL

What happened: rocmlir-driver --kernel-pipeline=full intermittently produces empty output (printing Lowering failed to stderr) for a specific conv_bwd_data configuration with --perf_config 'v3:128,64,8,128,64,16,1,1,2,1,1'. The test was XFAIL'd rather than fixed.

Risk: The flakiness signal is being suppressed, not resolved. If the underlying lowering failure is non-deterministic (e.g. a resource race, register pressure corner case, or unhandled fallback in the scavenger-disabled path), it could affect other large convolution or attention kernels in production.

Suggested tests to add:

File: mlir/test/rocmlir-driver/large-kernel-no-scavenge-error.mlir (new)

  • A test that explicitly invokes the same gen command and verifies it does not print Lowering failed to stderr (using FileCheck --implicit-check-not).
  • A deterministic stress test that runs the same command 3 times and checks all three succeed (a shell RUN loop), to surface flakiness early in CI rather than masking it.

Additionally, the Lowering failed error path in rocmlir-driver itself should be tested:

  • Verify that when lowering fails, the driver exits with a non-zero code and prints a diagnostic that includes the operation that failed (not just Lowering failed with no context).

3. migraphx.shaped parser crash fix — Parse-level errors untested

Commit: 839eb350e187AIROCMLIR-546 Fixed parser crash from invalid !migraphx.shaped

What changed: MIXRShapedType::parse() in MIGraphX.cpp now calls parser.emitError() in three places instead of crashing via get() when stride/shape counts mismatch.

Current test coverage: mlir/test/Dialect/MIGraphX/invalid.mlir tests only the verifier (verify()), not the parser (parse()). The three new emitError() call sites are completely untested:

  1. Failure to parse <, dimension list, or element type.
  2. Failure to parse the stride dimension list in a non-scalar shaped type.
  3. Failure to parse the closing >.

Suggested tests to add in mlir/test/Dialect/MIGraphX/invalid.mlir:

// -----
// expected-error @+1 {{expected shaped dimension list with type}}
func.func @bad_parse_missing_gt(%arg: !migraphx.shaped<1xf32) { func.return }

// -----
// expected-error @+1 {{expected `,` and a `x`-separated list}}
func.func @bad_parse_missing_stride(%arg: !migraphx.shaped<1xf32>) { func.return }

// -----
// expected-error @+1 {{expected shaped dimension list with type}}
func.func @bad_parse_garbage(%arg: !migraphx.shaped<garbage>) { func.return }

Why it matters: Without parsing tests, a refactor of parse() could silently remove the error handling and restore the crashing behavior.


4. Broadcasting Linalg lowering — Error paths and edge cases untested

Commit: a8ae8acacbd0[AIROCMLIR-552] Added Broadcasting Linalg Lowering Path

What changed: BroadcastConverter and MultiBroadcastConverter were rewritten in MIGraphXToLinalg.cpp to use linalg.broadcast instead of TOSA.

Current test coverage: Four tests in mixr-to-linalg-ops.mlir: axis=0, 4D multibroadcast, scalar multibroadcast, scalar broadcast.

Gaps:

Missing case Why it matters
broadcastDimensions.empty() in MultiBroadcastConverter (reshape-only, no broadcast needed) This branch is taken when the input and output have the same non-unit dims; never tested
arith::ConstantOp + DenseElementsAttr::isSplat() fast path in MultiBroadcastConverter Splat-constant optimization silently broken if isSplat() ever returns false for a constant
BroadcastConverter with axis > 0 and multi-dimensional input Code conditionally strips trailing 1 dims; only axis=0 and scalar tested
Error path: "cannot convert output type to ranked tensor type" No negative test

Suggested file to update: mlir/test/Conversion/MIGraphXToLinalg/mixr-to-linalg-ops.mlir


5. migraphx.greater / migraphx.equal — Missing type and error coverage

Commit: 712f49ed5447[AIROCMIR-446] Lower migraphx.greater/equal into linalg.generic

What changed: New BooleanElementwiseConverter<Greater> and BooleanElementwiseConverter<Equal> in MIGraphXToLinalg.cpp.

Current test coverage: 5 tests in migraphx-to-linalg-boolean.mlir covering i32, si32 (greater only), f32 for both ops.

Gaps:

  • f16 and bf16 types: these are the dominant compute types in rocMLIR attention and GEMM pipelines; no test verifies arith.cmpf ogt + arith.uitofp works correctly for f16 output.
  • migraphx.equal with si32 input.
  • No test for mismatched input types (should the converter reject or convert?). The code assumes operands share the same element type; if they don't, the linalg.generic body would emit a type error deep in lowering rather than a clear diagnostic.
  • No rank variation tests (rank-1 and rank-4 tensors).

Suggested file to update: mlir/test/Conversion/MIGraphXToLinalg/migraphx-to-linalg-boolean.mlir


6. Reshape helper — No-op and collapse-only paths untested

Commit: 529789d99c07[AIROCMLIR-564] Lower migraphx.reshape using helper function

What changed: The reshapeValue() helper in MIGraphXToLinalg.cpp has three code paths:

  1. Same-shape early return (no-op).
  2. Collapse-only (single CollapseShapeOp).
  3. Collapse + expand (general case, tested).

Current test coverage: Only the collapse (2D→3D expand) and expand (3D→2D collapse) cases are tested.

Suggested tests in mixr-to-linalg-ops.mlir:

  • migraphx.reshape with identical input/output shape — should return the input value unchanged (no new ops).
  • migraphx.reshape that only requires a tensor.collapse_shape (e.g., 4x4xf3216xf32).

Summary Table

Area File to Add/Update Priority
arith.maximumf preservation in full pipeline mlir/test/rocmlir-driver/large-kernel-float-minmax.mlir (new) High
arith-expand with include-float-min-max=false behavioral test external/llvm-project/mlir/test/Dialect/Arith/expand-ops.mlir High
large-kernel-no-scavenge deterministic stress + error path mlir/test/rocmlir-driver/large-kernel-no-scavenge-error.mlir (new) High
Parser crash fix — parse-level error paths mlir/test/Dialect/MIGraphX/invalid.mlir Medium
Broadcasting edge cases (empty broadcastDims, splat const, axis>0) mlir/test/Conversion/MIGraphXToLinalg/mixr-to-linalg-ops.mlir Medium
migraphx.greater/equal — f16, bf16, equal si32, negative mlir/test/Conversion/MIGraphXToLinalg/migraphx-to-linalg-boolean.mlir Medium
reshapeValue same-shape no-op + collapse-only mlir/test/Conversion/MIGraphXToLinalg/mixr-to-linalg-ops.mlir Low

Generated by automated regression-test coverage analysis on 2026-03-13, triggered by PR #2295.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions