ballet: speedup AVX512 reedsol encode by drubin-fd · Pull Request #9001 · firedancer-io/firedancer

drubin-fd · 2026-03-23T14:15:29Z

Zen5:
Before (GCC):

average time per encode call 1220 ns

After (GCC):

average time per encode call 870 ns

This is a 30% speedup, when measured on a Zen 5 machine. The main benefit of the newer approach is that it actually uses AVX512, where the old one was only using AVX2. The new version is also written in Zig, with the generated assembly checked in. This makes auditing the implementation and understanding it much easier.

Icelake:
Before (Clang):

average time per encode call 2049ns

After (Clang):

average time per encode call 1745ns

Icelake benchmarks are more interesting, and more accurate, as LLVM does not yet have scheduling information for Zen5, which makes it schedule somewhat poorly (we could win with more carefully crafted asm that uses zmm registers).

drubin-fd force-pushed the drubin/speedup-reedsol branch 4 times, most recently from f41c416 to 78be8cb Compare March 24, 2026 00:31

ballet: speedup AVX512 reedsol encode

7a0d5bc

drubin-fd force-pushed the drubin/speedup-reedsol branch from 78be8cb to 7a0d5bc Compare March 24, 2026 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ballet: speedup AVX512 reedsol encode#9001

ballet: speedup AVX512 reedsol encode#9001
drubin-fd wants to merge 1 commit intomainfrom
drubin/speedup-reedsol

drubin-fd commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drubin-fd commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drubin-fd commented Mar 23, 2026 •

edited

Loading