Problem description
When nonmaxsupression::use_dyn_output == False the node will return max_num_boxes boxes (equal to num_batches * total_boxes_in_batch_element).
If the nonmaxsupression algorithm selects less than max_num_boxes boxes, the rest of the output will be padded with zeros.
Output size is determined here:
|
std::vector<std::size_t> out_lens = {max_num_boxes, 3}; |
Padding with
0 happens here:
|
std::fill(output.begin(), output.end(), 0); |
This means that the first box of the first image in the batch will be included in the output up to
max_num_boxes times, while it should be included at most once.
This behavior breaks compatibility with the ONNX NonMaxSupression node (https://onnx.ai/onnx/operators/onnx__NonMaxSuppression.html) whose output only contains the selected boxes and isn't padded. This causes common post-NMS steps, like Top-K filtering based on the scores of the kept boxes, to be completely wrong.
When nonmaxsuppression::use_dyn_output == true you correctly shrink the output to only include the selected boxes:
|
return result.reshape({output_shape.type(), {num_selected, 3}}); |
This doesn't happen when
nonmaxsuppression::use_dyn_output == false.
There are a couple more problems with nonmaxsuppression::use_dyn_output:
use_dyn_output is never set when using the Python bindings for MIGraphX. nonmaxsuppression::use_dyn_output is set from onnx_parser::use_dyn_output:
|
op.from_value({{"use_dyn_output", parser.use_dyn_output}}); |
Which is in turn set from onnx_options::use_dyn_output:
|
parser.use_dyn_output = options.use_dyn_output; |
This option is never set when using the Python bindings:
|
migraphx::onnx_options options; |
- Even if
use_dyn_output were set to true, nonmaxsuppression::compute_shape would fail unless the node's input shapes are also dynamic, which isn't the case when the entire model is compiled with static shapes. If we just change the onnx_options::use_dyn_output (while keeping static input shapes) we get the following error: parse: PARSE_EXPAND: dynamic input tensor with fixed dims input not supported. We often have to compile the whole graph with static shapes, as otherwise we'd get dynamic-shape-related compilation errors.
I'd suggest to completely remove the nonmaxsuppression::use_dyn_output == false code paths, as this option would always produce wrong outputs.
This would also require changing the node so that nonmaxsuppression::use_dyn_output == true works with static input shapes.
Steps to reproduce
This behavior is already encoded in the reference tests, in the nms_test test case:
As you can see the expected outputs have been padded with
0, 0, 0 triplets which doesn't conform to the ONNX operator specification.
Environment
OS: Debian GNU/Linux 12 (bookworm)
GPU: AMD Radeon AI PRO R9700
ROCm version: 7.2.0
MIGraphX version: 2.16.0.dev+20250912-17-299-g0681c63ab
Problem description
When
nonmaxsupression::use_dyn_output == Falsethe node will returnmax_num_boxesboxes (equal tonum_batches * total_boxes_in_batch_element).If the
nonmaxsupressionalgorithm selects less thanmax_num_boxesboxes, the rest of the output will be padded with zeros.Output size is determined here:
AMDMIGraphX/src/include/migraphx/op/nonmaxsuppression.hpp
Line 187 in 0681c63
0happens here:AMDMIGraphX/src/include/migraphx/op/nonmaxsuppression.hpp
Line 324 in 0681c63
This means that the first box of the first image in the batch will be included in the output up to
max_num_boxestimes, while it should be included at most once.This behavior breaks compatibility with the ONNX NonMaxSupression node (https://onnx.ai/onnx/operators/onnx__NonMaxSuppression.html) whose output only contains the selected boxes and isn't padded. This causes common post-NMS steps, like Top-K filtering based on the scores of the kept boxes, to be completely wrong.
When
nonmaxsuppression::use_dyn_output == trueyou correctly shrink the output to only include the selected boxes:AMDMIGraphX/src/include/migraphx/op/nonmaxsuppression.hpp
Line 403 in 0681c63
nonmaxsuppression::use_dyn_output == false.There are a couple more problems with
nonmaxsuppression::use_dyn_output:use_dyn_outputis never set when using the Python bindings for MIGraphX.nonmaxsuppression::use_dyn_outputis set fromonnx_parser::use_dyn_output:AMDMIGraphX/src/onnx/parse_nonmaxsuppression.cpp
Line 42 in 0681c63
onnx_options::use_dyn_output:AMDMIGraphX/src/onnx/onnx.cpp
Line 73 in 0681c63
AMDMIGraphX/src/py/migraphx_py.cpp
Line 675 in 0681c63
use_dyn_outputwere set totrue,nonmaxsuppression::compute_shapewould fail unless the node's input shapes are also dynamic, which isn't the case when the entire model is compiled with static shapes. If we just change theonnx_options::use_dyn_output(while keeping static input shapes) we get the following error:parse: PARSE_EXPAND: dynamic input tensor with fixed dims input not supported. We often have to compile the whole graph with static shapes, as otherwise we'd get dynamic-shape-related compilation errors.I'd suggest to completely remove the
nonmaxsuppression::use_dyn_output == falsecode paths, as this option would always produce wrong outputs.This would also require changing the node so that
nonmaxsuppression::use_dyn_output == trueworks with static input shapes.Steps to reproduce
This behavior is already encoded in the reference tests, in the
nms_testtest case:AMDMIGraphX/test/ref/nonmaxsuppression.cpp
Line 272 in 0681c63
As you can see the expected outputs have been padded with
0, 0, 0triplets which doesn't conform to the ONNX operator specification.Environment