ROCm · causten · Mar 24, 2026 · Mar 21, 2026 · Mar 21, 2026 · Mar 21, 2026
@@ -303,8 +303,8 @@ Full documentation for MIGraphX is available at
 * Support for the Log2 internal operator
 * Support for the GCC 14 compiler
 * The BitwiseAnd, Scan, SoftmaxCrossEntropyLoss, GridSample, and NegativeLogLikelihoodLoss ONNX operators
-* The MatMulNBits, QuantizeLinear/DequantizeLinear, GroupQueryAttention, SkipSimplifiedLayerNormalization, and SimpliedLayerNormalization Microsoft Contrib operators
-* Dymamic batch parameter support to OneHot operator
+* The MatMulNBits, QuantizeLinear/DequantizeLinear, GroupQueryAttention, SkipSimplifiedLayerNormalization, and SimplifiedLayerNormalization Microsoft Contrib operators
+* Dynamic batch parameter support to OneHot operator
 * Split-K as an optional performance improvement
 * Scripts to validate ONNX models from the ONNX Model Zoo
 * GPU Pooling Kernel
@@ -314,7 +314,7 @@ Full documentation for MIGraphX is available at
 * Pointwise fusions with MLIR across reshape operations
 * MIGRAPHX_MLIR_DUMP environment variable to dump MLIR modules to MXRs
 * The 3 option to MIGRAPHX_TRACE_BENCHMARKING to print the MLIR program for improved debug output
-* MIGRAPHX_ENABLE_HIPBLASLT_GEMM environment variable to call hipBlasLt libaries
+* MIGRAPHX_ENABLE_HIPBLASLT_GEMM environment variable to call hipBlasLt libraries
 * MIGRAPHX_VERIFY_DUMP_DIFF to improve the debugging of accuracy issues
 * reduce_any and reduce_all options to the Reduce operation via Torch MIGraphX
 * Examples for RNNT, and ControlNet
@@ -332,7 +332,7 @@ Full documentation for MIGraphX is available at
 ### Removed
 
 * Disabled requirements for MIOpen and rocBlas when running on Windows.
-* Removed inaccuracte warning messages when using exhaustive-tune.
+* Removed inaccurate warning messages when using exhaustive-tune.
 * Remove the hard coded path in MIGRAPHX_CXX_COMPILER allowing the compiler to be installed in different locations.
 
 
@@ -342,15 +342,15 @@ Full documentation for MIGraphX is available at
     * Infrastructure code to enable better Kernel fusions with all supported data types
     * Subsequent model compile time by creating a cache for already performant kernels
     * Use of Attention fusion with models
-    * Performance of the Softmax JIT kernel and of the Pooling opterator
+    * Performance of the Softmax JIT kernel and of the Pooling operator
     * Tuning operations through a new 50ms delay before running the next kernel
     * Performance of several convolution based models through an optimized NHWC layout
     * Performance for the FP8 datatype
     * GPU utilization
     * Verification tools
     * Debug prints
     * Documentation, including gpu-driver utility documentation
-    * Summary section of the migrahx-driver perf command
+    * Summary section of the migraphx-driver perf command
 * Reduced model compilation time
 * Reordered some compiler passes to allow for more fusions
 * Preloaded tiles into LDS to improve performance of pointwise transposes
@@ -607,7 +607,7 @@ Full documentation for MIGraphX is available at
 * Fixed compile warnings for shadowing variable names
 * Added missing specialization for the `nullptr` hash function
 
-### Changees
+### Changes
 
 * Bumped version of half library to 5.6.0
 * Bumped CI to support ROCm 5.6

@@ -24,7 +24,7 @@
 cmake_minimum_required(VERSION 3.15 FATAL_ERROR)
 
 if("${CMAKE_SOURCE_DIR}" STREQUAL "${CMAKE_BINARY_DIR}")
-    message(FATAL_ERROR "The binary and source directroy cannot be the same")
+    message(FATAL_ERROR "The binary and source directory cannot be the same")
 endif()
 
 # Setup valid strings for build type

@@ -74,7 +74,7 @@ RUN pip3 install https://repo.radeon.com/rocm/manylinux/rocm-rel-7.1.1/torch-2.8
                  https://repo.radeon.com/rocm/manylinux/rocm-rel-7.1.1/torchvision-0.24.0%2Brocm7.1.1.gitb919bd0c-cp310-cp310-linux_x86_64.whl\
                  https://repo.radeon.com/rocm/manylinux/rocm-rel-7.1.1/triton-3.4.0%2Brocm7.1.1.git0cace8d2-cp310-cp310-linux_x86_64.whl
 
-# add this for roctracer dependancies
+# add this for roctracer dependencies
 RUN pip3 install CppHeaderParser
 
 # Workaround broken rocm packages

@@ -36,7 +36,7 @@ We start with a snippet of the simple ``add_two_literals()`` function::
     // compile the program on the reference device
     p.compile(migraphx::ref::target{});
 
-    // evaulate the program and retreive the result
+    // evaluate the program and retrieve the result
     auto result = p.eval({}).back();
     std::cout << "add_two_literals: 1 + 2 = " << result << "\n";
 

@@ -43,7 +43,7 @@ We start with a snippet of the simple ``add_two_literals()`` function::
     // compile the program on the reference device
     p.compile(migraphx::ref::target{});
 
-    // evaulate the program and retreive the result
+    // evaluate the program and retrieve the result
     auto result = p.eval({}).back();
     std::cout << "add_two_literals: 1 + 2 = " << result << "\n";
 

@@ -457,13 +457,13 @@ Operator Support Matrix
 +--------------------------+-----------+-----------------+------------------------------+
 | MaxPool                  | ✅        | FP32, FP16,     | ``storage_order``            |
 |                          |           | FP8, INT8       | not supported,               |
-|                          |           |                 | ``dialtion`` is              |
+|                          |           |                 | ``dilation`` is              |
 |                          |           |                 | partially                    |
 |                          |           |                 | supported on                 |
 |                          |           |                 | GPU (MIOpen                  |
 |                          |           |                 | limitation),                 |
 |                          |           |                 | ``indices`` 2nd              |
-|                          |           |                 | ouput not                    |
+|                          |           |                 | output not                   |
 |                          |           |                 | supported                    |
 +--------------------------+-----------+-----------------+------------------------------+
 | MaxRoiPool               | ❌        |                 |                              |

@@ -361,7 +361,7 @@ struct loader
                 {
                     auto dyn_dim = parse_dyn_dims_json(x);
                     if(dyn_dim.size() != 1)
-                        MIGRAPHX_THROW("dim_param must only specifiy one dimension");
+                        MIGRAPHX_THROW("dim_param must only specify one dimension");
                     map_dim_params[name] = dyn_dim.front();
                 }
             }

@@ -1,7 +1,7 @@
 /*
  * The MIT License (MIT)
  *
- * Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.
+ * Copyright (c) 2015-2026 Advanced Micro Devices, Inc. All rights reserved.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to deal
@@ -101,7 +101,7 @@ MIGRAPHX_EXPORT
 std::vector<std::size_t> compute_common_lens(const std::vector<shape>& shapes);
 
 /**
- * @ brief Compute the common (broadcasted) dynamic dimensions of a list of dynamic shapes
+ * @brief Compute the common (broadcasted) dynamic dimensions of a list of dynamic shapes
  */
 MIGRAPHX_EXPORT
 std::vector<shape::dynamic_dimension> compute_common_dyn_dims(const std::vector<shape>& shapes);

@@ -1,7 +1,7 @@
 /*
  * The MIT License (MIT)
  *
- * Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.
+ * Copyright (c) 2015-2026 Advanced Micro Devices, Inc. All rights reserved.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to deal
@@ -121,7 +121,7 @@ constexpr uint8_t cast_to_f8(T f_x, bool stoch = false, uint32_t rng = 0)
         return NegativeZeroNan ? 0 : 0x80; // For FNUZ types neg zero is just positive zero
     }
 
-    /* First need to check if it is normal or denorm as there is a difference of implict 1
+    /* First need to check if it is normal or denorm as there is a difference of implicit 1
     Then need to adjust the exponent to align with the F8 exponent, in the meanwhile, shift
     The mantissa. Then for stochastic rounding, add rng to mantissa and truncate. And for
     RNE, no need to add rng. Then probably need to check whether there is carry and adjust
@@ -157,7 +157,7 @@ constexpr uint8_t cast_to_f8(T f_x, bool stoch = false, uint32_t rng = 0)
         {
             /* This is the case where fp32/fp16 is normal but it is in f8 denormal range.
             For example fp8 FNUZ mode, denormal exponent is -7, but if the fp32/fp16
-            actual exponent is -7, it is actually larger due to the implict 1,
+            actual exponent is -7, it is actually larger due to the implicit 1,
             Therefore it needs to be adjust to -6 and mantissa shift right by 1.
             So for fp32/fp16, exponent -8 is the cut point to convert to fp8 FNUZ */
             exponent_diff = f8_denormal_act_exponent - act_exponent;
@@ -186,7 +186,7 @@ constexpr uint8_t cast_to_f8(T f_x, bool stoch = false, uint32_t rng = 0)
     else if(exponent_diff == -1)
         mantissa <<= -exponent_diff;
     bool implicit_one = mantissa & (1 << mfmt);
-    // if there is no implict 1, it  means the f8 is denormal and need to adjust to denorm exponent
+    // if there is no implicit 1, it means the f8 is denormal and need to adjust to denorm exponent
     f8_exponent =
         (act_exponent + exponent_diff) /*actual f8 exponent*/ + f8_bias - (implicit_one ? 0 : 1);
 

@@ -214,7 +214,7 @@ struct MIGRAPHX_EXPORT module
         std::vector<std::size_t> scalar_const_out_lens = {};
     };
 
-    /// Compute a new ouput shape by replacing each parameter with input
+    /// Compute a new output shape by replacing each parameter with input
     /// shapes passed in.
     std::vector<shape> compute_shapes(const std::vector<shape>& inputs,
                                       compute_shapes_options options) const;

@@ -1,7 +1,7 @@
 /*
  * The MIT License (MIT)
  *
- * Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.
+ * Copyright (c) 2015-2026 Advanced Micro Devices, Inc. All rights reserved.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to deal
@@ -40,12 +40,12 @@ namespace op {
  * Called with `axis` attribute that defaults to the last output axis
  * Constant depth: `onehot(indices, values), depth attribute must be set;
  * Variable depth: `onehot(indices, depth, values)`;
- * `indicies` as a N rank tensor of indices where value is `on_value`
+ * `indices` as a N rank tensor of indices where value is `on_value`
  * `depth` scalar with the number of classes for the one-hot dimension
  * `values` `[off_value, on_value]`
  * `axis` which axis to add the one-hot dimension to
  * For axis = 0 and rank(indices) = 2:
- * output is A[indicies[j, k], j, k] = on_value; A[i, j, k] = off_value otherwise
+ * output is A[indices[j, k], j, k] = on_value; A[i, j, k] = off_value otherwise
  * Can be simplified to other operators when `indices` has a static shape and
  * `depth` is constant at compile-time.
  */

@@ -1,7 +1,7 @@
 /*
  * The MIT License (MIT)
  *
- * Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.
+ * Copyright (c) 2015-2026 Advanced Micro Devices, Inc. All rights reserved.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to deal
@@ -47,7 +47,7 @@ template <typename Derived>
 struct scatter_op : op_name<Derived>
 {
     int64_t axis = 0;
-    // skip scattering indicies that are out of bounds
+    // skip scattering indices that are out of bounds
     bool skip_out_of_bounds = false;
 
     template <class Self, class F>

@@ -1,7 +1,7 @@
 /*
  * The MIT License (MIT)
  *
- * Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.
+ * Copyright (c) 2015-2026 Advanced Micro Devices, Inc. All rights reserved.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to deal
@@ -180,7 +180,7 @@ struct allocation_segment
         std::size_t n = 1 + (ins->get_shape().bytes() - 1) / alignment;
         assert(n > 0);
         std::size_t start = 0;
-        // Insert at end if it cant fit at the begining
+        // Insert at end if it can't fit at the beginning
         if(segments.empty() or segments.begin()->first <= n)
         {
             auto it = find_gap(segments, n);

@@ -1,7 +1,7 @@
 /*
  * The MIT License (MIT)
  *
- * Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.
+ * Copyright (c) 2015-2026 Advanced Micro Devices, Inc. All rights reserved.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to deal
@@ -709,7 +709,7 @@ struct parse_attention : op_parser<parse_attention>
         // each attention head
 
         if(attention.padding_mode() == mask_pad::raw)
-        { // Raw Mask - 0 means mask, 1 means pass through. Apply mask_filter_val to mask indicies
+        { // Raw Mask - 0 means mask, 1 means pass through. Apply mask_filter_val to mask indices
           // and zero otherwise
             // Need to generate from 2 dims or 3 dim cases
             return generate_raw_mask_per_batch(info, attention);

@@ -1,7 +1,7 @@
 /*
  * The MIT License (MIT)
  *
- * Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.
+ * Copyright (c) 2015-2026 Advanced Micro Devices, Inc. All rights reserved.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to deal
@@ -269,7 +269,7 @@ struct parse_softmaxcrossentropyloss : op_parser<parse_softmaxcrossentropyloss>
     instruction_ref handle_index_selection(const onnx_parser::node_info& info,
                                            const instruction_ref labels) const
     {
-        // Pick out the coordinates from the inputs to gerneate the proper indicies to gather
+        // Pick out the coordinates from the inputs to generate the proper indices to gather
         // what will be operated on later.
 
         // Use label indices to select weights
@@ -287,7 +287,7 @@ struct parse_softmaxcrossentropyloss : op_parser<parse_softmaxcrossentropyloss>
             // Trying to replicate torch arrange() here.
             std::vector<int64_t> vect_of_lit(len_val);
             std::iota(vect_of_lit.begin(), vect_of_lit.end(), 0);
-            auto batch_dim_indicies =
+            auto batch_dim_indices =
                 info.add_literal(migraphx::shape(label_shape.type(), {len_val}), vect_of_lit);
 
             // This is supposed to do unsq_dims = [:a] + [a + 1:]
@@ -298,13 +298,13 @@ struct parse_softmaxcrossentropyloss : op_parser<parse_softmaxcrossentropyloss>
             unsq_dims.erase(it);
 
             auto batch_dim_index_unsq = info.add_instruction(
-                migraphx::make_op("unsqueeze", {{"axes", unsq_dims}}), batch_dim_indicies);
+                migraphx::make_op("unsqueeze", {{"axes", unsq_dims}}), batch_dim_indices);
 
-            auto batch_dim_indicies_bc = info.add_instruction(
+            auto batch_dim_indices_bc = info.add_instruction(
                 migraphx::make_op("multibroadcast",
                                   {{"out_lens", labels_unsq->get_shape().lens()}}),
                 batch_dim_index_unsq);
-            coordinate_index_literals.push_back(batch_dim_indicies_bc);
+            coordinate_index_literals.push_back(batch_dim_indices_bc);
         }
 
         coordinate_index_literals.push_back(labels_unsq);
@@ -433,7 +433,7 @@ struct parse_softmaxcrossentropyloss : op_parser<parse_softmaxcrossentropyloss>
         }
 
         // Index selection before loss calculation completed
-        auto gathernd_indicies = handle_index_selection(info, labels);
+        auto gathernd_indices = handle_index_selection(info, labels);
 
         std::vector<int64_t> perm(class_size, 0);
         if(is_k_dim)
@@ -444,7 +444,7 @@ struct parse_softmaxcrossentropyloss : op_parser<parse_softmaxcrossentropyloss>
                                           scores);
         }
 
-        scores = info.add_instruction(migraphx::make_op("gathernd"), scores, gathernd_indicies);
+        scores = info.add_instruction(migraphx::make_op("gathernd"), scores, gathernd_indices);
 
         std::vector<int64_t> axis_list(ndims - 1, 0);
         std::iota((axis_list.begin() + 1), axis_list.end(), 2);
@@ -455,11 +455,11 @@ struct parse_softmaxcrossentropyloss : op_parser<parse_softmaxcrossentropyloss>
         if(is_k_dim)
             weights = info.add_instruction(migraphx::make_op("transpose", {{"permutation", perm}}),
                                            weights);
-        weights = info.add_instruction(migraphx::make_op("gathernd"), weights, gathernd_indicies);
+        weights = info.add_instruction(migraphx::make_op("gathernd"), weights, gathernd_indices);
 
-        // Do pointwise operators on the final set of indicies and scores we care about rather than
+        // Do pointwise operators on the final set of indices and scores we care about rather than
         // before so that we're not doing a bunch of pointwise on items that aren't part of the loss
-        // calulation.
+        // calculation.
         auto log_sm_scores = scores;
         if(is_softmaxcrossentropy)
         {

@@ -1,7 +1,7 @@
 /*
  * The MIT License (MIT)
  *
- * Copyright (c) 2015-2024 Advanced Micro Devices, Inc. All rights reserved.
+ * Copyright (c) 2015-2026 Advanced Micro Devices, Inc. All rights reserved.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to deal
@@ -727,7 +727,7 @@ std::vector<operation> rewrite_rnn::gru_actv_funcs(instruction_ref ins) const
     auto gru_op = any_cast<op::gru>(ins->get_operator());
     // before rewrite the gru operator, need to ensure
     // we have 4 actv funcs, even though a user does not
-    // specifiy any actv func. If less than 4, use the
+    // specify any actv func. If less than 4, use the
     // algorithm in parse_gru to make 4 actv functions
     if(gru_op.direction == op::rnn_direction::bidirectional)
     {
@@ -1206,7 +1206,7 @@ std::vector<operation> rewrite_rnn::lstm_actv_funcs(instruction_ref ins) const
     auto lstm_op = any_cast<op::lstm>(ins->get_operator());
     // before rewrite the lstm operator, need to ensure
     // we have 6 actv funcs, even though a user does not
-    // specifiy any actv func. If less than 46, use the
+    // specify any actv func. If less than 6, use the
     // algorithm in parse_lstm to make 6 actv functions
     const auto& actv_funcs     = lstm_op.actv_funcs;
     std::size_t num_actv_funcs = actv_funcs.size();

@@ -1,7 +1,7 @@
 /*
  * The MIT License (MIT)
  *
- * Copyright (c) 2015-2025 Advanced Micro Devices, Inc. All rights reserved.
+ * Copyright (c) 2015-2026 Advanced Micro Devices, Inc. All rights reserved.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the "Software"), to deal
@@ -175,7 +175,7 @@ struct match_find_quantizable_ops
                 qop, migraphx::make_op("quant_convolution", conv_val), qop_args);
             auto out_lens = dq->get_shape().lens();
 
-            // Ensure input and weight quantization paramaters are of a proper form
+            // Ensure input and weight quantization parameters are of a proper form
             // Input is of shape [n, c, x1, ..., xn]. Only scalar quantization allowed
             // Weight is of shape [k, c, y1, ... , yn]. Valid quantization axis is k