Skip to content

PoC: compute twiddles on the fly#102

Draft
Shnatsel wants to merge 4 commits intomainfrom
on-the-fly-twiddles
Draft

PoC: compute twiddles on the fly#102
Shnatsel wants to merge 4 commits intomainfrom
on-the-fly-twiddles

Conversation

@Shnatsel
Copy link
Copy Markdown
Collaborator

@Shnatsel Shnatsel commented Mar 31, 2026

Computes twiddles on the fly in the variable-size kernel instead of caching them.

This reduces memory usage since the twiddles no longer need to be stored. When combined with #95 we need no auxiliary memory at all for the entire FFT process, including interleaving/deinterleaving and bit reversal.

Zen4 results: on large sizes this is neutral on a single thread and a 12% improvement when multi-threaded. However, small sizes that are not memory-bound do take a large performance penalty, with up to 60% more time taken.

M4 seems to benefit from this on large sizes.

Detailed benchmarks:

Single threaded:

Zen4: https://gist.github.com/Shnatsel/55e32f567c6f1a99ee96f2713bc89b5a
M4: https://gist.github.com/Shnatsel/2695ba6fbc3263a2679f045db5546127

Multi-threaded:

Zen4: https://gist.github.com/Shnatsel/3db0fe24d59c7e4a9a923a1b47653ca0
M4: https://gist.github.com/Shnatsel/e03e1cd148e5208b15448d01223ed5a8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant