Skip to content

Notes and scripts for AMD profiling of dycore#1047

Draft
iomaganaris wants to merge 64 commits intomainfrom
amd_profiling
Draft

Notes and scripts for AMD profiling of dycore#1047
iomaganaris wants to merge 64 commits intomainfrom
amd_profiling

Conversation

@iomaganaris
Copy link
Copy Markdown
Collaborator

@iomaganaris iomaganaris commented Feb 5, 2026

This Pull Request includes scripts to benchmark and profile the dycore granule as well as one of the most time consuming GT4Py Programs of it, the vertically_implicit_solver_at_predictor_step.

We'll keep this PR open for interaction and keep it up-to-date with improvements.

The PR includes the following important files:

  • AMD_INTRODUCTION.md: Includes (hopefully) all the informations necessary to run the benchmark scripts for the dycore granule and the vertically_implicit_solver_at_predictor_step as well as an introduction on icon4py, GT4Py and DaCe. There are also some suggestions regarding how to view and understand the generated code
  • amd_scripts/install_icon4py_venv.sh: Script to install icon4py along with all the dependencies necessary to run the profilers
  • amd_scripts/benchmark_dycore.sh: Sbatch script for Beverin to run and time the GT4Py Programs of the dycore
  • amd_scripts/benchmark_solver.sh: Sbatch script for Beverin to benchark and profile the vertically_implicit_solver_at_predictor_step. Looking at the profiles of the kernels generated by this GT4Py program is the most interesting topic as it should improve the performance across most of the other dycore GT4Py Programs as well

Currently, based on #1018 which points to GT4Py/main (which will become GT4Py v1.1.4 in the next week).

@iomaganaris iomaganaris requested a review from havogt February 5, 2026 16:19
Comment thread install_icon4py_uenv.sh Outdated
Comment thread INTRODUCTION.md Outdated
Comment thread INTRODUCTION.md Outdated
Comment thread benchmark_dycore.sh Outdated
Comment thread install_icon4py_uenv.sh Outdated
Comment thread install_icon4py_uenv.sh Outdated
fi

# Install icon4py, gt4py, DaCe and other basic dependencies using uv
uv sync --extra all --python $(which python3.12)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not install all the extras but maybe we properly add cupy-rocm7 as an extra to avoid line 29. I can work on that.

Comment thread install_icon4py_uenv.sh Outdated
Comment thread install_icon4py_uenv.sh Outdated
…osure_vars to fix the caching of the dycore programs
Comment thread amd_scripts/benchmark_dycore.sh Outdated
--benchmark-warmup=on \
--benchmark-warmup-iterations=30 \
--backend=dace_gpu \
--grid=icon_benchmark_regional \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--grid=icon_benchmark_regional \
--grid=icon_benchmark_global \

Since global is our main target for now, maybe we can switch to that.

Base automatically changed from update_dace_version to main February 16, 2026 11:56
@sfantao
Copy link
Copy Markdown

sfantao commented Apr 13, 2026

@iomaganaris, I am getting:

Resolved 258 packages in 6ms                                                                                                                                                                                                           
  × Failed to download `amd-cupy==13.5.1`                                                                                                                                                                                              
  ╰─▶ Hash mismatch for `amd-cupy==13.5.1`                                                                                                                                                                                             
                                                                                                                                                                                                                                       
      Expected:                                                                                                                                                                                                                        
        sha256:de3138281e2711e06efaf49a31310d0d4824998e18d43e13e288a0e52ca75ec0                                                                                                                                                        
                                                                                                                                                                                                                                       
      Computed:                                                                                                                                                                                                                        
        sha256:c3f586d3786fb02a606148a6cc5662411bb143fa6f2fc7533ce1d98b77a1ed1c                                                                                                                                                        
  help: `amd-cupy` (v13.5.1) was included because `icon4py[rocm7-0]` (v0.0.6) depends on `icon4py-common[rocm7-0]` (v0.0.6) which depends on `amd-cupy`   

if I do:

    sed -i 's/de3138281e2711e06efaf49a31310d0d4824998e18d43e13e288a0e52ca75ec0/c3f586d3786fb02a606148a6cc5662411bb143fa6f2fc7533ce1d98b77a1ed1c/g' uv.lock
  bash amd_scripts/install_icon4py_venv.sh

... install succeeds but I get cupy errors while testing:

============================= test session starts ==============================
platform linux -- Python 3.12.9, pytest-8.3.4, pluggy-1.5.0 -- /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/bin/python3
cachedir: .pytest_cache
benchmark: 5.1.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=100 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=True warmup_iterations=30)
rootdir: /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py
configfile: pyproject.toml
plugins: factoryboy-2.7.0, dash-4.1.0, Faker-33.1.0, xdist-3.6.1, devtools-0.12.2, cov-6.0.0, mpi-0.6, benchmark-5.1.0, unused-fixtures-0.2.0
collecting ... collected 1 item

model/atmosphere/dycore/tests/dycore/integration_tests/test_benchmark_solve_nonhydro.py::test_benchmark_solve_nonhydro[True-False] ERROR

==================================== ERRORS ====================================
_________ ERROR at setup of test_benchmark_solve_nonhydro[True-False] __________

self = <cupy.cuda.compiler._NVRTCProgram object at 0x14ab043be0f0>
options = ('--std=c++14', '-I/capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/...rdu63qmj6/lib/clang/20/include/cuda_wrappers', '-I/usr/lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14', ...)
log_stream = None

  def compile(self, options=(), log_stream=None):
      try:
          if self.name_expressions:
              for ker in self.name_expressions:
                  nvrtc.addNameExpression(self.ptr, ker)
>           nvrtc.compileProgram(self.ptr, options)

.venv/lib/python3.12/site-packages/cupy/cuda/compiler.py:757: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
cupy_backends/cuda/libs/nvrtc.pyx:125: in cupy_backends.cuda.libs.nvrtc.compileProgram
  ???
cupy_backends/cuda/libs/nvrtc.pyx:138: in cupy_backends.cuda.libs.nvrtc.compileProgram
  ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   cupy_backends.cuda.libs.nvrtc.NVRTCError: HIPRTC_ERROR_COMPILATION (6)

cupy_backends/cuda/libs/nvrtc.pyx:53: NVRTCError

During handling of the above exception, another exception occurred:

request = <SubRequest 'grid_manager' for <Function test_benchmark_solve_nonhydro[True-False]>>
backend_like = {'backend_factory': <function make_custom_dace_backend at 0x14ab0a1767a0>, 'device': <DeviceType.ROCM: 10>}

  @pytest.fixture(scope="session")
  def grid_manager(
      request: pytest.FixtureRequest, backend_like: model_backends.BackendLike
  ) -> gm.GridManager | None:
      """
      Fixture for providing a grid_manager instance.
  
      The provided grid instance is based on the configuration specified in the
      pytest command line option `--grid <grid_name>:<grid_levels>`, where `<grid_name>`
      might refer to a known grid configuration or to an existing ICON NetCDF grid file,
      and `<grid_levels>` specifies the number of vertical levels to use (optional).
      """
      name, num_levels = _evaluate_grid_option(request)
  
      allocator = model_backends.get_allocator(backend_like)
  
      if name in VALID_GRID_PRESETS:
>           grid_manager = _get_grid_manager_from_preset(
              name, num_levels=num_levels, allocator=allocator
          )

model/testing/src/icon4py/model/testing/fixtures/stencil_tests.py:91: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
model/testing/src/icon4py/model/testing/fixtures/stencil_tests.py:57: in _get_grid_manager_from_preset
  return grid_utils.get_grid_manager_from_identifier(
model/testing/src/icon4py/model/testing/grid_utils.py:49: in get_grid_manager_from_identifier
  return get_grid_manager(
model/testing/src/icon4py/model/testing/grid_utils.py:74: in get_grid_manager
  manager(allocator=allocator, keep_skip_values=keep_skip_values)
model/common/src/icon4py/model/common/grid/grid_manager.py:133: in __call__
  self._geometry = self._read_geometry_fields(allocator)
model/common/src/icon4py/model/common/grid/grid_manager.py:278: in _read_geometry_fields
  gridfile.GeometryName.EDGE_CELL_DISTANCE.value: gtx.as_field(
.venv/lib/python3.12/site-packages/gt4py/next/constructors.py:281: in as_field
  field[...] = field.array_ns.asarray(data)
.venv/lib/python3.12/site-packages/gt4py/next/embedded/nd_array_field.py:365: in __setitem__
  self._ndarray[target_slice] = value  # type: ignore[index] # np and cp allow index assignment, jax overrides
cupy/_core/core.pyx:1688: in cupy._core.core._ndarray_base.__setitem__
  ???
cupy/_core/_routines_indexing.pyx:51: in cupy._core._routines_indexing._ndarray_setitem
  ???
cupy/_core/_routines_indexing.pyx:1027: in cupy._core._routines_indexing._scatter_op
  ???
cupy/_core/_kernel.pyx:1374: in cupy._core._kernel.ufunc.__call__
  ???
cupy/_core/_kernel.pyx:1401: in cupy._core._kernel.ufunc._get_ufunc_kernel
  ???
cupy/_core/_kernel.pyx:1082: in cupy._core._kernel._get_ufunc_kernel
  ???
cupy/_core/_kernel.pyx:94: in cupy._core._kernel._get_simple_elementwise_kernel
  ???
cupy/_core/_kernel.pyx:82: in cupy._core._kernel._get_simple_elementwise_kernel_from_code
  ???
cupy/_core/core.pyx:2396: in cupy._core.core.compile_with_cache
  ???
.venv/lib/python3.12/site-packages/cupy/cuda/compiler.py:558: in _compile_module_with_cache
  return _compile_with_cache_hip(
.venv/lib/python3.12/site-packages/cupy/cuda/compiler.py:979: in _compile_with_cache_hip
  binary, mapping = compile_using_nvrtc(
.venv/lib/python3.12/site-packages/cupy/cuda/compiler.py:397: in compile_using_nvrtc
  return _compile(source, options, cu_path,
.venv/lib/python3.12/site-packages/cupy/cuda/compiler.py:381: in _compile
  compiled_obj, mapping = prog.compile(options, log_stream)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <cupy.cuda.compiler._NVRTCProgram object at 0x14ab043be0f0>
options = ('--std=c++14', '-I/capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/...rdu63qmj6/lib/clang/20/include/cuda_wrappers', '-I/usr/lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14', ...)
log_stream = None

  def compile(self, options=(), log_stream=None):
      try:
          if self.name_expressions:
              for ker in self.name_expressions:
                  nvrtc.addNameExpression(self.ptr, ker)
          nvrtc.compileProgram(self.ptr, options)
          mapping = None
          if self.name_expressions:
              mapping = {}
              for ker in self.name_expressions:
                  mapping[ker] = nvrtc.getLoweredName(self.ptr, ker)
          if log_stream is not None:
              log_stream.write(nvrtc.getProgramLog(self.ptr))
          # This is to ensure backwards compatibility with nvrtc
          if self.method == 'cubin':
              return nvrtc.getCUBIN(self.ptr), mapping
          elif self.method == 'ptx':
              return nvrtc.getPTX(self.ptr), mapping
          # TODO(leofang): support JIT LTO using nvrtc.getNVVM()?
          # need -dlto and -arch=compute_XX
          else:
              raise RuntimeError('Unknown NVRTC compile method')
      except nvrtc.NVRTCError:
          log = nvrtc.getProgramLog(self.ptr)
>           raise CompileException(log, self.src, self.name, options,
                                 'nvrtc' if not runtime.is_hip else 'hiprtc')
E           cupy.cuda.compiler.CompileException: In file included from /tmp/comgr-433362/input/tmp/tmpzr6gi4m6/f6c932f8af819fb6fd038aa82113ecb87630145a.hsaco.cu:2:
E           In file included from /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/_core/include/cupy/carray.cuh:6:
E           /usr/lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/initializer_list:41:15: error: expected '{'
E              41 | namespace std _GLIBCXX_VISIBILITY(default)
E                 |               ^
E           /usr/lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/initializer_list:41:15: error: a type specifier is required for all declarations
E           /usr/lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/initializer_list:41:35: error: expected expression
E              41 | namespace std _GLIBCXX_VISIBILITY(default)
E                 |                                   ^
E           /usr/lib64/gcc/x86_64-suse-linux/14/../../../../include/c++/14/initializer_list:41:43: error: expected ';' after top level declarator
E              41 | namespace std _GLIBCXX_VISIBILITY(default)
E                 |                                           ^
E           In file included from /tmp/comgr-433362/input/tmp/tmpzr6gi4m6/f6c932f8af819fb6fd038aa82113ecb87630145a.hsaco.cu:2:
E           /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/_core/include/cupy/carray.cuh:438:32: error: no template named 'initializer_list' in namespace 'std'
E             438 |                     const std::initializer_list<Int> shape)
E                 |                           ~~~~~^
E           /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/_core/include/cupy/carray.cuh:446:32: error: no template named 'initializer_list' in namespace 'std'
E             446 |                     const std::initializer_list<Int1> shape,
E                 |                           ~~~~~^
E           /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/_core/include/cupy/carray.cuh:447:32: error: no template named 'initializer_list' in namespace 'std'
E             447 |                     const std::initializer_list<Int2> strides)
E                 |                           ~~~~~^
E           /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/_core/include/cupy/carray.cuh:485:39: error: no template named 'initializer_list' in namespace 'std'
E             485 |   __device__ T& operator[](const std::initializer_list<Int> idx_) {
E                 |                                  ~~~~~^
E           /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/_core/include/cupy/carray.cuh:493:45: error: no template named 'initializer_list' in namespace 'std'
E             493 |   __device__ const T& operator[](const std::initializer_list<Int> idx_) const {
E                 |                                        ~~~~~^
E           /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/_core/include/cupy/carray.cuh:635:32: error: no template named 'initializer_list' in namespace 'std'
E             635 |                     const std::initializer_list<int> shape,
E                 |                           ~~~~~^
E           /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/_core/include/cupy/carray.cuh:636:32: error: no template named 'initializer_list' in namespace 'std'
E             636 |                     const std::initializer_list<int> strides)
E                 |                           ~~~~~^
E           /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/_core/include/cupy/carray.cuh:708:34: error: no template named 'initializer_list' in namespace 'std'
E             708 |   __device__ CIndexer(const std::initializer_list<Int> shape)
E                 |                             ~~~~~^
E           /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/_core/include/cupy/carray.cuh:715:34: error: no template named 'initializer_list' in namespace 'std'
E             715 |   __device__ CIndexer(const std::initializer_list<Int1> shape,
E                 |                             ~~~~~^
E           /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/_core/include/cupy/carray.cuh:716:34: error: no template named 'initializer_list' in namespace 'std'
E             716 |                       const std::initializer_list<Int2> index)
E                 |                             ~~~~~^
E           /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/_core/include/cupy/carray.cuh:794:34: error: no template named 'initializer_list' in namespace 'std'
E             794 |   __device__ CIndexer(const std::initializer_list<int> shape)
E                 |                             ~~~~~^
E           /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/_core/include/cupy/carray.cuh:800:34: error: no template named 'initializer_list' in namespace 'std'
E             800 |   __device__ CIndexer(const std::initializer_list<int> shape,
E                 |                             ~~~~~^
E           /capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/cupy/_core/include/cupy/carray.cuh:801:34: error: no template named 'initializer_list' in namespace 'std'
E             801 |                       const std::initializer_list<int> index)
E                 |                             ~~~~~^
E           17 errors generated when compiling for gfx942.

.venv/lib/python3.12/site-packages/cupy/cuda/compiler.py:776: CompileException
=============================== warnings summary ===============================
.venv/lib/python3.12/site-packages/_pytest/config/__init__.py:1278
/capstor/store/cscs/director2/g174/sfantao/icon4py/icon4py/.venv/lib/python3.12/site-packages/_pytest/config/__init__.py:1278: PytestConfigWarning: assertions not in test modules or plugins will be ignored because assert statements are not executed by the underlying Python interpreter (are you using python -O?)

  self._warn_about_missing_assertion(mode)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
ERROR model/atmosphere/dycore/tests/dycore/integration_tests/test_benchmark_solve_nonhydro.py::test_benchmark_solve_nonhydro[True-False]
========================= 1 warning, 1 error in 9.28s ==========================

I guess we need to use a specific version of CuPY whose dependency chain is broken?

@philip-paul-mueller
Copy link
Copy Markdown
Collaborator

philip-paul-mueller commented Apr 14, 2026

@sfantao
I saw the download issue as well, after a while I realized that the problem is the index itself.
I found out that depending from where (CSCS or my home) I visited https://pypi.amd.com/rocm-7.0.2/simple I got different pages.
I suspect a that there are multiple servers (load balancing) and one contains a bad (for whatever reason) index.
However, after modifying the lock (as you did) I was able to install and run it normally.

However, I tried it again and I get the same error as you now.

@dganellari
Copy link
Copy Markdown

dganellari commented Apr 15, 2026

@sfantao

The fix in this case seems to be switching from amd-cupy to cupy-rocm-7-0==14.0.1.

CuPy 14.0.1 crashes on ROCm 7.0+ with "__shfl_xor_sync: mask must be 64-bit".
I will soon patch the amd_scripts/install_icon4py_venv.sh script to fix it automatically after uv sync.

But if you already have a working venv, the fix is to switch cupy 14 and patch it:

pip uninstall amd-cupy -y
pip install cupy-rocm-7-0==14.0.1
sed -i 's/#if (HIP_VERSION < 60200000) || defined(HIP_DISABLE_WARP_SYNC_BUILTINS)/#if 1/' \
  .venv/lib/python3.12/site-packages/cupy/_core/include/cupy/hip_workaround.cuh

This strips the warp mask for all ROCm versions which is safe — AMD wavefronts
are lock-step so the mask is unused. Upstream fix: cupy/cupy#9748

@iomaganaris
Copy link
Copy Markdown
Collaborator Author

Hi @sfantao,

I have updated this PR with the changes from @dganellari in the icon4py installation scripts so the installation of cupy-rocm-7-0==14.0.1 and its patch works out of the box.
I have also updated this branch to the icon4py main branch and did the same also for the gt4py branch used by this branch

@github-actions
Copy link
Copy Markdown

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default
  • cscs-ci run distributed

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

  • cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants