Draft
Conversation
…ground truth sin_cos occasionally
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Computes twiddles on the fly in the variable-size kernel instead of caching them.
This reduces memory usage since the twiddles no longer need to be stored. When combined with #95 we need no auxiliary memory at all for the entire FFT process, including interleaving/deinterleaving and bit reversal.
Zen4 results: on large sizes this is neutral on a single thread and a 12% improvement when multi-threaded. However, small sizes that are not memory-bound do take a large performance penalty, with up to 60% more time taken.
M4 seems to benefit from this on large sizes.
Detailed benchmarks:
Single threaded:
Zen4: https://gist.github.com/Shnatsel/55e32f567c6f1a99ee96f2713bc89b5a
M4: https://gist.github.com/Shnatsel/2695ba6fbc3263a2679f045db5546127
Multi-threaded:
Zen4: https://gist.github.com/Shnatsel/3db0fe24d59c7e4a9a923a1b47653ca0
M4: https://gist.github.com/Shnatsel/e03e1cd148e5208b15448d01223ed5a8