PoC: Radix 2^2 + parallel kernel by Shnatsel · Pull Request #104 · QuState/PhastFT

Shnatsel · 2026-04-04T15:08:53Z

Stacked on top of #103

Adds parallelism to the FFT kernel scaffold, which lets us parallelize the final few stages that aren't parallelized by recursion.

Recursion is still preferable because it provides natural cache locality and makes the algorithm cache-oblivious.

Performance vs main: -27% time taken on Zen4, -38% time taken on M4
measured on cargo run --profile=profiling --all-features --example benchmark -- 64 28 5

…prep for refactoring into radix-2^2

…required amount of passes over memory

…at performance is like

…o see what performance is like" This reverts commit 29a8a1e.

…y way

…through the parallelized radix-2^2 kernel

codecov-commenter · 2026-04-04T15:10:21Z

Codecov Report

❌ Patch coverage is 75.24038% with 103 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.15%. Comparing base (3fc78e0) to head (bdf8335).

Files with missing lines	Patch %	Lines
src/kernels/dit.rs	66.44%	100 Missing ⚠️
src/algorithms/dit.rs	97.45%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #104      +/-   ##
==========================================
- Coverage   99.79%   94.15%   -5.65%     
==========================================
  Files           8        8              
  Lines        1438     1812     +374     
==========================================
+ Hits         1435     1706     +271     
- Misses          3      106     +103

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Shnatsel added 16 commits April 2, 2026 23:31

Add narrow variants of dit_chunk_n_simd to reduce register pressure; …

0966524

…prep for refactoring into radix-2^2

Initial implementation of radix-2^2-like fused kernel, to reduce the …

5c36c41

…required amount of passes over memory

Correctly wire up fused kernels to the recursive algorithm

5e45202

Document block size better

1a4a931

revert changes to src/algorithms/dit.rs

b3b8f3d

Wire up radix-2^2 differently

31083c8

Apply the same structure to f32

6bc30ab

Add a parallel f64 fft kernel

1af726b

Janky Claude-generated PoC for parallelizing the last stage to see wh…

29a8a1e

…at performance is like

Revert "Janky Claude-generated PoC for parallelizing the last stage t…

4550f57

…o see what performance is like" This reverts commit 29a8a1e.

Add public API wrapper for parallel f64 kernel

54ce677

Wire up the parallel kernel to the recursive algorithm in a less jank…

86ba3b7

…y way

parallelize the last two radix2^2 passes instead of the last one

dfd6546

Add f32 parallel kernel scaffold

cc96d52

Fix feature-gating

f2517f9

Frontload the single-stage pass to make odd FFT stage counts also go …

bdf8335

…through the parallelized radix-2^2 kernel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PoC: Radix 2^2 + parallel kernel#104

PoC: Radix 2^2 + parallel kernel#104
Shnatsel wants to merge 16 commits intomainfrom
radix-2-2-parallel

Shnatsel commented Apr 4, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Shnatsel commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Apr 4, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Shnatsel commented Apr 4, 2026 •

edited

Loading