Skip to content

CI: Update ci-cuda tests to maintained workflow#3282

Closed
dschwoerer wants to merge 6 commits intonextfrom
ci-cuda
Closed

CI: Update ci-cuda tests to maintained workflow#3282
dschwoerer wants to merge 6 commits intonextfrom
ci-cuda

Conversation

@dschwoerer
Copy link
Copy Markdown
Contributor

No description provided.

@ZedThree
Copy link
Copy Markdown
Member

Error is in CMake:

  CUDA language enabled prior to setting CMAKE_CUDA_HOST_COMPILER.  Please
  set CMAKE_CUDA_HOST_COMPILER prior to ENABLE_LANGUAGE(CUDA) or PROJECT(..
  LANGUAGES CUDA)

key: zenodo-data-${{ hashFiles('tests/integrated/test-fci-mpi/CMakeLists.txt') }}

- name: Build minimal CUDA 12.2 @ GCC9.4.0 @ Ubuntu 20.04
- name: Build minimal CUDA 12.6 @ GCC11 @ Ubuntu 22.04
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason the base image is using ubuntu 22.04 rather than the 24.04 LTS version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see whether we can update to 24.04 and cuda 13.1.1:
https://github.com/boutproject/bout-container-base/actions/runs/22481022925

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that works, please could you also bump fmt to 12.x?

@ZedThree
Copy link
Copy Markdown
Member

I set that variable in the CMake call, now failing with:

In file included from /usr/local/cuda/include/thrust/system/cuda/detail/execution_policy.h:40,
                 from /usr/local/cuda/include/thrust/iterator/detail/device_system_tag.h:31,
                 from /usr/local/cuda/include/thrust/iterator/iterator_traits.h:75,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/util.h:44,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/core/alignment.h:31,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/core/triple_chevron_launch.h:38,
                 from /spack-env/.spack-env/view/include/cub/device/dispatch/dispatch_scan.cuh:49,
                 from /spack-env/.spack-env/view/include/cub/device/device_scan.cuh:38,
                 from /spack-env/.spack-env/view/include/RAJA/policy/cuda/scan.hpp:28,
                 from /spack-env/.spack-env/view/include/RAJA/policy/cuda.hpp:38,
                 from /spack-env/.spack-env/view/include/RAJA/RAJA.hpp:71,
                 from /__w/BOUT-dev/BOUT-dev/include/bout/field2d.hxx:42,
                 from /__w/BOUT-dev/BOUT-dev/include/bout/boundary_op.hxx:9,
                 from /__w/BOUT-dev/BOUT-dev/src/field/field2d.cxx:32:
/usr/local/cuda/include/thrust/system/cuda/config.h:122:2: error: #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
  122 | #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
      |  ^~~~~

It looks like an issue with the spack environment. @yassineAlouini any ideas? What's different about how we're building here, vs in bout-container-base?

Comment thread .github/workflows/tests.yml Outdated
@yassineAlouini
Copy link
Copy Markdown

I set that variable in the CMake call, now failing with:

In file included from /usr/local/cuda/include/thrust/system/cuda/detail/execution_policy.h:40,
                 from /usr/local/cuda/include/thrust/iterator/detail/device_system_tag.h:31,
                 from /usr/local/cuda/include/thrust/iterator/iterator_traits.h:75,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/util.h:44,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/core/alignment.h:31,
                 from /usr/local/cuda/include/thrust/system/cuda/detail/core/triple_chevron_launch.h:38,
                 from /spack-env/.spack-env/view/include/cub/device/dispatch/dispatch_scan.cuh:49,
                 from /spack-env/.spack-env/view/include/cub/device/device_scan.cuh:38,
                 from /spack-env/.spack-env/view/include/RAJA/policy/cuda/scan.hpp:28,
                 from /spack-env/.spack-env/view/include/RAJA/policy/cuda.hpp:38,
                 from /spack-env/.spack-env/view/include/RAJA/RAJA.hpp:71,
                 from /__w/BOUT-dev/BOUT-dev/include/bout/field2d.hxx:42,
                 from /__w/BOUT-dev/BOUT-dev/include/bout/boundary_op.hxx:9,
                 from /__w/BOUT-dev/BOUT-dev/src/field/field2d.cxx:32:
/usr/local/cuda/include/thrust/system/cuda/config.h:122:2: error: #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
  122 | #error The version of CUB in your include path is not compatible with this release of Thrust. CUB is now included in the CUDA Toolkit, so you no longer need to use your own checkout of CUB. Define THRUST_IGNORE_CUB_VERSION_CHECK to ignore this.
      |  ^~~~~

It looks like an issue with the spack environment. @yassineAlouini any ideas? What's different about how we're building here, vs in bout-container-base?

I will give this a look this weekend if it is still not resolved. 👌

@dschwoerer
Copy link
Copy Markdown
Contributor Author

We now get a different error:


2026-02-27T10:27:43.0122165Z [ 61%] Building CUDA object CMakeFiles/bout++.dir/src/invert/laplace/impls/pcr_thomas/pcr_thomas.cxx.o
2026-02-27T10:27:43.0175546Z nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
2026-02-27T10:27:47.0313978Z /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-10.2.1-622kbbyf567riee3rfkdrvtwe3hqiekh/include/fmt/format-inl.h: In function 'std::system_error fmt::v10::vsystem_error(int, fmt::v10::string_view, fmt::v10::format_args)':
2026-02-27T10:27:47.0316843Z /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-10.2.1-622kbbyf567riee3rfkdrvtwe3hqiekh/include/fmt/format-inl.h:150:8: error: expected primary-expression before 'class'
2026-02-27T10:27:47.0319058Z   150 |   return std::system_error(ec, vformat(fmt, args));
2026-02-27T10:27:47.0319601Z       |        ^~~~~
2026-02-27T10:27:47.0320798Z /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-10.2.1-622kbbyf567riee3rfkdrvtwe3hqiekh/include/fmt/format-inl.h:150:7: error: expected ';' before 'class'
2026-02-27T10:27:47.0322165Z   150 |   return std::system_error(ec, vformat(fmt, args));
2026-02-27T10:27:47.0322692Z       |       ^~~~~~
2026-02-27T10:27:47.0323029Z       |       ;
2026-02-27T10:27:47.0324263Z /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-10.2.1-622kbbyf567riee3rfkdrvtwe3hqiekh/include/fmt/format-inl.h:150:8: error: expected primary-expression before 'class'
2026-02-27T10:27:47.0325700Z   150 |   return std::system_error(ec, vformat(fmt, args));
2026-02-27T10:27:47.0326207Z       |        ^~~~~
2026-02-27T10:27:47.0421325Z /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-10.2.1-622kbbyf567riee3rfkdrvtwe3hqiekh/include/fmt/format-inl.h: In function 'void fmt::v10::format_system_error(fmt::v10::detail::buffer<char>&, int, const char*)':
2026-02-27T10:27:47.0423855Z /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-10.2.1-622kbbyf567riee3rfkdrvtwe3hqiekh/include/fmt/format-inl.h:1414:32: error: expected primary-expression before 'class'
2026-02-27T10:27:47.0425484Z  1414 |     write(std::back_inserter(out), std::system_error(ec, message).what());
2026-02-27T10:27:47.0426176Z       |                                ^~~~~
2026-02-27T10:27:47.1229664Z [ 61%] Building CUDA object CMakeFiles/bout++.dir/src/invert/laplace/impls/petsc/petsc_laplace.cxx.o
2026-02-27T10:27:47.1291053Z nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
2026-02-27T10:27:49.1813197Z [ 62%] Building CUDA object CMakeFiles/bout++.dir/src/invert/laplace/impls/petsc3damg/petsc3damg.cxx.o
2026-02-27T10:27:49.1876845Z nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
2026-02-27T10:27:50.1917462Z /__w/BOUT-dev/BOUT-dev/include/bout/paralleltransform.hxx(107): warning #611-D: overloaded virtual function "ParallelTransform::toFieldAligned" is only partially overridden in class "ParallelTransformIdentity"
2026-02-27T10:27:50.1919340Z   class ParallelTransformIdentity : public ParallelTransform {
2026-02-27T10:27:50.1919971Z         ^
2026-02-27T10:27:50.1920153Z 
2026-02-27T10:27:50.1920884Z Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
2026-02-27T10:27:50.1921642Z 
2026-02-27T10:27:50.1933708Z /__w/BOUT-dev/BOUT-dev/include/bout/paralleltransform.hxx(107): warning #611-D: overloaded virtual function "ParallelTransform::fromFieldAligned" is only partially overridden in class "ParallelTransformIdentity"
2026-02-27T10:27:50.1935562Z   class ParallelTransformIdentity : public ParallelTransform {
2026-02-27T10:27:50.1936238Z         ^
2026-02-27T10:27:50.1936797Z 
2026-02-27T10:27:50.2021861Z /__w/BOUT-dev/BOUT-dev/include/bout/paralleltransform.hxx(182): warning #611-D: overloaded virtual function "ParallelTransform::toFieldAligned" is only partially overridden in class "ShiftedMetric"
2026-02-27T10:27:50.2024868Z   class ShiftedMetric : public ParallelTransform {
2026-02-27T10:27:50.2025401Z         ^
2026-02-27T10:27:50.2025575Z 
2026-02-27T10:27:50.2042796Z /__w/BOUT-dev/BOUT-dev/include/bout/paralleltransform.hxx(182): warning #611-D: overloaded virtual function "ParallelTransform::fromFieldAligned" is only partially overridden in class "ShiftedMetric"
2026-02-27T10:27:50.2045796Z   class ShiftedMetric : public ParallelTransform {
2026-02-27T10:27:50.2046328Z         ^
2026-02-27T10:27:50.2046506Z 
2026-02-27T10:27:51.8038656Z [ 62%] Building CUDA object CMakeFiles/bout++.dir/src/invert/laplace/impls/serial_band/serial_band.cxx.o
2026-02-27T10:27:51.8098247Z nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
2026-02-27T10:27:51.8502656Z make[2]: *** [CMakeFiles/bout++.dir/build.make:715: CMakeFiles/bout++.dir/src/invert/laplace/impls/pcr/pcr.cxx.o] Error 1

It seems to use the provided fmt. Did we recently change something here, that requires a newer fmt?

@ZedThree
Copy link
Copy Markdown
Member

 /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-10.2.1-622kbbyf567riee3rfkdrvtwe3hqiekh/include/fmt/format-inl.h:150:8: error: expected primary-expression before 'class'
   150 |   return std::system_error(ec, vformat(fmt, args));
       |        ^~~~~

The error doesn't match up with the highlighted bit, so something funky is going on.

Did we recently change something here, that requires a newer fmt?

I don't think so, but the C++20 stuff might need a newer version. At any rate, it would be good to match the bundled version

@yassineAlouini
Copy link
Copy Markdown

After investigating the build failures, here's a summary of the root cause and two fix options:

Root Cause

There is a fmt version mismatch between what the Spack environment in bout-container-base provides and what BOUT-dev bundles:

Source fmt version
Spack in ci-cuda Dockerfile 10.2.1
BOUT-dev bundled submodule 12.1.0 (FMT_VERSION 120100)

The cmake call in the BOUT-dev CI passes -DBOUT_USE_SYSTEM_FMT=on, which forces BOUT-dev to use the Spack-provided fmt 10.2.1 instead of its bundled 12.1.0. That older version has a known issue with nvcc: the parameter named fmt in format-inl.h:150 conflicts with nvcc's C++20 parsing, producing the misleading expected primary-expression before 'class' error.

Additionally, Spack v0.23.0 (used in the Dockerfile) only goes up to fmt 11.0.2 — it doesn't have fmt@12.x at all.


Fix Options

Option A (recommended — cleanest): Remove fmt from the Spack environment in bout-container-base, and remove -DBOUT_USE_SYSTEM_FMT=on from the BOUT-dev CI cmake call — letting BOUT-dev use its own bundled fmt 12.1.0.

Changes needed:

  • bout-container-base (ci-cuda/Dockerfile): remove spack add fmt@10.2.1 and fmt from the spack load line in bout-env.bash
  • BOUT-dev (tests.yml): remove -DBOUT_USE_SYSTEM_FMT=on from the cmake invocation

Option B: Upgrade Spack fmt to 11.0.2 (the latest available in Spack v0.23.0) and keep -DBOUT_USE_SYSTEM_FMT=on — but this still wouldn't match BOUT-dev's bundled 12.1.0 and is unverified to fix the nvcc parsing issue.

Option A is the cleaner path since it eliminates the dependency conflict and leverages the already-working bundled version.

@ZedThree
Copy link
Copy Markdown
Member

Option A sounds reasonable. Is there a reason to not consider upgrading spack to 1.x?

@yassineAlouini
Copy link
Copy Markdown

Option A sounds reasonable. Is there a reason to not consider upgrading spack to 1.x?

I will explore option A further and let you know.

@dschwoerer
Copy link
Copy Markdown
Contributor Author

I have implemented option A:
https://github.com/boutproject/bout-container-base/blob/main/ci-cuda/Dockerfile
And in bout-container-base it seems to pass, but I tried re-running the CI here, but that still fails, referencing /spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/fmt-11.0.2-awlqht63s2zj4bpb75vinczlpkblluuu/include/fmt/format-inl.h - but the hash of the new ci-cuda image. Any ideas?

@yassineAlouini
Copy link
Copy Markdown

@dschwoerer I will have a look later and let you know if I find an explanation, good job. 🫡

Comment thread .github/workflows/tests.yml Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we just need to use the bundled fmt here?

Suggested change

Comment thread .github/workflows/tests.yml Outdated
@ZedThree
Copy link
Copy Markdown
Member

ZedThree commented Mar 9, 2026

Well that didn't work because it somehow already found fmt:

-- Using fmt submodule
-- Submodule update
-- {fmt} version: 12.1.0
-- Build type: 
CMake Error at externalpackages/fmt/CMakeLists.txt:309 (add_library):
  add_library cannot create ALIAS target "fmt::fmt" because another target
  with the same name already exists.


CMake Error at externalpackages/fmt/CMakeLists.txt:365 (add_library):
  add_library cannot create ALIAS target "fmt::fmt-header-only" because
  another target with the same name already exists.

@dschwoerer
Copy link
Copy Markdown
Contributor Author

I think Umpire(?) also depends on fmt. So likely Umpire also needs to be build with a more recent fmt version ... so we should build the CI image with fmt (it will anyway be included) and then just use that one ...

I still do not understand why the CI passed in the bout-container-base repo, where BOUT++ was also build. Anyway, we will not get C++20 support this way.

I am giving up for now.

@dschwoerer dschwoerer closed this Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants