Skip to content

Add CMakePresets for target micro arch#1348

Open
AntoinePrv wants to merge 9 commits into
xtensor-stack:masterfrom
AntoinePrv:cmake-presets
Open

Add CMakePresets for target micro arch#1348
AntoinePrv wants to merge 9 commits into
xtensor-stack:masterfrom
AntoinePrv:cmake-presets

Conversation

@AntoinePrv
Copy link
Copy Markdown
Contributor

@AntoinePrv AntoinePrv commented May 13, 2026

I've taken the direction of explicit flags such as -mavx -mno-avx2.
This is IMHO less error prone and more accurate that using architecture name such as haswell.
The main difference is that this does not add other feature flags or change the -mtune model.
For a test setting accuracy is more important IMHO.

Comment thread .github/workflows/linux.yml Outdated
@serge-sans-paille
Copy link
Copy Markdown
Contributor

I really like your approach and will eagerly merge it once it validates \o/

@AntoinePrv
Copy link
Copy Markdown
Contributor Author

I've only kept the micro architecture target in CMakePresets.txt because combining with (debug/release) / (xtl on/off)... results in a combinatorial explosion of presets for which there is currently no support.
Another shortcoming is that we cannot dispatch here based on compiler for MSVC flags. We can do it based on OS but it is not quite the same.

I have ongoing work to actually do the same as these presets at the CMake level, with a function that can be made available to users to help in the tooling for dynamic dispatch (our current solution in Arrow is very verbose).
In this case, we'd need to also define a safe -march baseline. The reason is the code in these translation units might also include non SIMD code (this is sometimes the case in Arrow). In this case, with very advanced instruction sets, we're leaving perf on the table by having a x86-64 baseline. But what should be a reasonable baseline for dynamic dispatching to for example avx2?

  • haswell (first avx2) also has fma3 and bmi2
  • -march=haswell -mno-fma3 -mno-bmi2 if that is a thing?
  • Or go further back? sandybridge (first avx)? nehalem (first sse4.2)

@AntoinePrv AntoinePrv force-pushed the cmake-presets branch 2 times, most recently from a7e66a6 to d87c148 Compare May 19, 2026 14:13
@AntoinePrv
Copy link
Copy Markdown
Contributor Author

@serge-sans-paille this is in a ready state, but I am not fully happy with it.

Getting into AVX512, and AVX512-256, the combinatorial explosion of possibilities start to show again.
Inheritance of flags from other settings is also not possible.

This reinforce my belief that I should keep on with the work to do it in CMake (that could also be installed for our users to improve our dynamic dispatch tooling), and also homogenized with the test TARGET_ARCH var.

This PR is not completely worthless though. For example we now have the possibility to really test with avx512f, which was not the case before because no Intel arch is limited to the f feature only.

What do you think? Should we give this some mileage before I get the time to work on a CMake solution?

CXXFLAGS="$CXX_FLAGS -DXSIMD_DEFAULT_ARCH=avx_128"
fi
if [[ '${{ matrix.sys.flags }}' == 'avx2' ]]; then
CMAKE_EXTRA_ARGS="$CMAKE_EXTRA_ARGS -DTARGET_ARCH=haswell"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what a nice cleanup. The current merge conflict is probably related to the latest addition of avx512vl support, sorry about that

fi
if [[ '${{ matrix.sys.flags }}' == 'i386' ]]; then
CXX_FLAGS="$CXX_FLAGS -m32"
export CXXFLAGS="$CXXFLAGS -m32"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!!!


- name: Build
run: cmake --build _build
run: cmake --build _build --parallel
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably have the same issue elsewhere, I'll handle hat

Comment thread CMakePresets.json
{
"name": "avx512f",
"cacheVariables": {
"CMAKE_CXX_FLAGS": "$env{CXXFLAGS} -march=x86-64-v2 -mno-sse4a -mavx -mavx2 -mavx512f -mno-avx512cd -mno-avx512dq -mno-avx512bw -mno-avx512er -mno-avx512pf -mno-avx512ifma -mno-avx512vbmi -mno-avx512vbmi2 -mno-avx512vnni"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this strict checking. And it looks like this did not uncover too many errors, yay

Comment thread CMakePresets.json
{
"name": "avx2",
"cacheVariables": {
"CMAKE_CXX_FLAGS": "$env{CXXFLAGS} -march=x86-64-v2 -mno-sse4a -mavx2 -mno-avx512f"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we sometime have fallback from avx2 instructions to sse instructions. How can this work??

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do understand the need to prune higher instruction sets, but not the need to prune lower ones, please explain.

Copy link
Copy Markdown
Contributor

@serge-sans-paille serge-sans-paille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good, except the question on pruning lower architectures which raises a big unknown to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants