Skip to content

ballet: speedup AVX512 reedsol encode#9001

Draft
drubin-fd wants to merge 1 commit intomainfrom
drubin/speedup-reedsol
Draft

ballet: speedup AVX512 reedsol encode#9001
drubin-fd wants to merge 1 commit intomainfrom
drubin/speedup-reedsol

Conversation

@drubin-fd
Copy link
Copy Markdown
Contributor

@drubin-fd drubin-fd commented Mar 23, 2026

Zen5:
Before (GCC):

average time per encode call 1220 ns

After (GCC):

average time per encode call 870 ns

This is a 30% speedup, when measured on a Zen 5 machine. The main benefit of the newer approach is that it actually uses AVX512, where the old one was only using AVX2. The new version is also written in Zig, with the generated assembly checked in. This makes auditing the implementation and understanding it much easier.

Icelake:
Before (Clang):

average time per encode call 2049ns

After (Clang):

average time per encode call 1745ns

Icelake benchmarks are more interesting, and more accurate, as LLVM does not yet have scheduling information for Zen5, which makes it schedule somewhat poorly (we could win with more carefully crafted asm that uses zmm registers).

@drubin-fd drubin-fd force-pushed the drubin/speedup-reedsol branch 4 times, most recently from f41c416 to 78be8cb Compare March 24, 2026 00:31
@drubin-fd drubin-fd force-pushed the drubin/speedup-reedsol branch from 78be8cb to 7a0d5bc Compare March 24, 2026 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants