Skip to content

[Issue]: MIGraphX Dynamic Shape Issue ONNXRuntime #4618

@DiarmuidKelly

Description

@DiarmuidKelly

Problem Description

MIGraphX Dynamic Shape Issue ONNXRuntime

Environment

OS:
  NAME="Ubuntu"
  VERSION="25.10 (Questing Quokka)"
  Kernel: 6.17.0-14-generic
CPU:
  13th Gen Intel(R) Core(TM) i5-13600KF
GPU:
  Name: gfx1101
  Marketing Name: AMD Radeon RX 7800 XT
ROCm: 7.2
Python: 3.12.9

Env Deps (from repo.radeon.com/rocm/manylinux/rocm-rel-7.2/):

  • triton-3.5.1+rocm7.2.0.gita272dfa8-cp312-cp312-linux_x86_64.whl
  • torch-2.9.1+rocm7.2.0.lw.git7e1940d4-cp312-cp312-linux_x86_64.whl
  • torchvision-0.24.0+rocm7.2.0.gitb919bd0c-cp312-cp312-linux_x86_64.whl
  • onnxruntime_migraphx-1.23.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl

Issue Summary

MIGraphX crashes on second inference when input tensor dimensions change. The first inference succeeds, but subsequent inferences with different sequence lengths fail with a shape mismatch error.

Steps to Reproduce

  1. Load any ONNX model with dynamic input shapes via MIGraphXExecutionProvider
  2. Run inference with input shape [1, 512, 16] - SUCCEEDS
  3. Run inference with input shape [1, 512, 10] - FAILS

The model compiles successfully on first run, but fails when subsequent inferences have different tensor dimensions (e.g. different sequence lengths).

Observed behaviour with a TTS model (dynamic sequence length):

Inference Input Result Time
1 16 chars OK 94s (initial compile)
2 16 chars OK 0.06s (cached, same shape)
3 10 chars OK 150s (recompiles for new shape)
4 55 chars FAILED shape mismatch error
5 5 chars FAILED shape mismatch error

Error Message

migraphx_parse_onnx_buffer: Error: .../AMDMIGraphX/src/include/migraphx/op/dot.hpp:117:
compute_shape: DOT: static inner dimensions do not match: {1, 512, 70} x {1, 13, 24}

[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running
MGXKernel_graph_main_graph_3837677502492941375_6 node

Root Cause Analysis

MIGraphX compiles GPU kernels for specific tensor shapes during the first inference. When subsequent inferences have different input dimensions (different text lengths = different sequence lengths), the pre-compiled kernels fail because they expect the original shapes.

This is a fundamental limitation in how MIGraphX handles dynamic shapes. Unlike CPUExecutionProvider or CUDAExecutionProvider which handle dynamic shapes at runtime, MIGraphX requires static shapes known at compile time.

Related Issues

Impact

This makes MIGraphX unsuitable for any model with dynamic input shapes, which includes most NLP/TTS models where sequence length varies with input.

Workarounds Attempted

  1. Model caching (migraphx_model_cache_dir) - Does not help, shapes still mismatch
  2. Warming with large input first - Does not help, smaller inputs still fail

Requested Resolution

Implement dynamic shape support in MIGraphX, or document this limitation clearly in the ONNX Runtime MIGraphX provider documentation.

At minimum, MIGraphX should fall back gracefully to recompilation when shapes change, rather than crashing with an assertion error.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions