Problem description
We noticed that model compilation times have risen significantly when we upgraded from MIGraphx eb603d0 to 0681c63.
We've bisected MIGraphX and narrowed down the issue to this specific MIGraphX commit: ac23987
Specifically, with eb603d0 we were able to compile 10 of our models in ~45 min (which is still not ideal compared to TensorRT where the same compilation takes around 10 min), but it is usable.
When we upgrade MIGraphX to ac23987 the compilation is around 10 times slower and we were only able to compile 3 of our 10 models in 60 min which is unusable.
Because the commit ac23987 is a simple bump in the version of the compiled rocMLIR library, we have written a script that re-compiles MIGraphX ac23987 with a bisected version of rocMLIR.
You can see the output of our bisect log here (the commit tags are tags from the rocMLIR repository):
Bisecting: 8 revisions left to test after this (roughly 3 steps)
Compiling MIGraphX with ac0dcc901b3d9aacda8030ec75ced17b0c35cbd8 ...
Compiling model ...
Compilation duration: 27.16 s
Benchmark duration: 4.62 s -> throughput = 124.1
GOOD!
Bisecting: 4 revisions left to test after this (roughly 2 steps)
Compiling MIGraphX with 11d5c9db8dec5f198e80bab10c59549f401e6c8f ...
Compiling model ...
Compilation duration: 27.26 s
Benchmark duration: 4.68 s -> throughput = 122.6
GOOD!
Bisecting: 2 revisions left to test after this (roughly 1 step)
Compiling MIGraphX with 15ab5c900205d32e6b0591642cc3494ec14c34ef ...
Compiling model ...
Skipped 4 configs for gpu::mlir_op
Skipped 4 configs for gpu::mlir_op
Skipped 4 configs for gpu::mlir_op
Skipped 4 configs for gpu::mlir_op
Skipped 4 configs for gpu::mlir_op
Skipped 4 configs for gpu::mlir_op
Skipped 4 configs for gpu::mlir_op
Skipped 2 configs for gpu::mlir_op
Compilation duration: 212.06 s
Benchmark duration: 4.48 s -> throughput = 128.1
BAD!!!
Bisecting: 0 revisions left to test after this (roughly 0 steps)
Compiling MIGraphX with 51df5f49afac8ec2ed5cdc623affc6685c651c6f ...
Compiling model ...
Compilation duration: 27.39 s
Benchmark duration: 4.66 s -> throughput = 123.0
GOOD!
15ab5c900205d32e6b0591642cc3494ec14c34ef is the first bad commit
commit 15ab5c900205d32e6b0591642cc3494ec14c34ef
Author: Mirza Halilčević <109971222+mirza-halilcevic@users.noreply.github.com>
Date: Wed Mar 4 17:07:37 2026 +0100
[AIROCMLIR-44] Update quick-tune lists for gemm and conv (#2212)
* Update gfx950 quick-tune lists for gemm and conv.
* Update gfx942 quick-tune lists for gemm and conv.
* Update gfx90a quick-tune lists for gemm and conv.
* Update gfx908 quick-tune lists for gemm and conv.
* Update gfx1201 quick-tune lists for gemm and conv.
* Update gfx1101 quick-tune lists for gemm and conv, and delete old
gfx1100 lists.
* Update gfx1150 quick-tune lists for gemm.
* Update gfx1150 quick-tune lists for conv.
.../Dialect/Rock/Tuning/QuickTuningPerfconfigs.inc | 2525 +++++++++++++-------
mlir/test/CAPI/mixr_full.c | 2 +-
mlir/test/Dialect/Rock/affix_tuning_params.mlir | 76 +-
.../noTransA-noTransB/broadcasted-k-e2e.mlir | 2 +-
.../noTransA-transB/broadcasted-k-e2e.mlir | 2 +-
.../gemm-layouts/transA-noTransB/gemm-k-e2e.mlir | 2 +-
.../gemm-layouts/transA-noTransB/sliced-k-e2e.mlir | 2 +-
.../transA-noTransB/unitdim-m-e2e.mlir | 2 +-
.../gemm-layouts/transA-transB/gemm-k-e2e.mlir | 2 +-
.../gemm-layouts/transA-transB/sliced-k-e2e.mlir | 2 +-
10 files changed, 1770 insertions(+), 847 deletions(-)
We have identified that the issue in rocMLIR is this commit: ROCm/rocMLIR@15ab5c9
As you can see, the compilation time has increased 10 times (from 27.16 s to 212.06 s), while the throughput of the compiled model has barely changed (from 124.1 to 128.1).
Also, as a side note, this is when our logging issues also began that are being tracked in #4665.
Environment
OS: Debian GNU/Linux 12 (bookworm)
CPU: AMD Ryzen 9 9950X 16-Core Processor
GPU: AMD Radeon AI PRO R9700
ROCm version: 7.2.0
MIGraphX version: 2.16.0.dev+20250912-17-291-gac2398773
Problem description
We noticed that model compilation times have risen significantly when we upgraded from MIGraphx eb603d0 to 0681c63.
We've bisected MIGraphX and narrowed down the issue to this specific MIGraphX commit: ac23987
Specifically, with eb603d0 we were able to compile 10 of our models in ~45 min (which is still not ideal compared to TensorRT where the same compilation takes around 10 min), but it is usable.
When we upgrade MIGraphX to ac23987 the compilation is around 10 times slower and we were only able to compile 3 of our 10 models in 60 min which is unusable.
Because the commit ac23987 is a simple bump in the version of the compiled rocMLIR library, we have written a script that re-compiles MIGraphX ac23987 with a bisected version of rocMLIR.
You can see the output of our bisect log here (the commit tags are tags from the rocMLIR repository):
We have identified that the issue in rocMLIR is this commit: ROCm/rocMLIR@15ab5c9
As you can see, the compilation time has increased 10 times (from
27.16 sto212.06 s), while the throughput of the compiled model has barely changed (from124.1to128.1).Also, as a side note, this is when our logging issues also began that are being tracked in #4665.
Environment