Skip to content

Separate benchmark into different files#56

Open
blegat wants to merge 3 commits intomainfrom
bl/sep_bench
Open

Separate benchmark into different files#56
blegat wants to merge 3 commits intomainfrom
bl/sep_bench

Conversation

@blegat
Copy link
Copy Markdown
Owner

@blegat blegat commented May 6, 2026

It seems we get a 50x speedup against pytorch on CPU and 3x speedup on GPU, it's a bit suspicious ^^

CPU

Lux

BenchmarkTools.Trial: 98 samples with 1 evaluation per sample.
 Range (min … max):  37.716 ms … 789.766 ms  ┊ GC (min … max):  0.00% … 95.02%
 Time  (median):     38.784 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   51.381 ms ±  76.966 ms  ┊ GC (mean ± σ):  20.52% ± 14.34%

  █   ▁ ▂▁                                                      
  █▄▁▁█▄██▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  37.7 ms       Histogram: log(frequency) by time       183 ms <

 Memory estimate: 16.97 MiB, allocs estimate: 61.

Hand-CUDA without prealloc

BenchmarkTools.Trial: 448 samples with 1 evaluation per sample.
 Range (min … max):   4.078 ms … 751.828 ms  ┊ GC (min … max):  0.00% … 98.83%
 Time  (median):      8.326 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   11.175 ms ±  35.843 ms  ┊ GC (mean ± σ):  25.07% ± 16.20%

  ▂         ▃█              ▂▂▁                                 
  █▅▁▄▁▁▄▁▁▁██▆▄▁▄▁▁▁▁▁▁▁▁▁▄███▇▁▁▄▁▁▁▁▁▁▁▁▄▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▆
  4.08 ms       Histogram: log(frequency) by time      26.6 ms <

 Memory estimate: 14.11 MiB, allocs estimate: 23.

Hand-CUDA with prealloc

BenchmarkTools.Trial: 451 samples with 1 evaluation per sample.
 Range (min … max):   7.990 ms … 748.508 ms  ┊ GC (min … max):  0.00% … 98.83%
 Time  (median):      8.298 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   11.099 ms ±  35.492 ms  ┊ GC (mean ± σ):  22.49% ± 13.54%

  ▄▅█▅                       ▂                                  
  ████▄▄▄▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▆▅████▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▄▅▆▅▆▇▇▆▆▆ ▆
  7.99 ms       Histogram: log(frequency) by time      15.1 ms <

 Memory estimate: 8.55 MiB, allocs estimate: 15.

PyTorch eager

BenchmarkTools.Trial: 3048 samples with 1 evaluation per sample.
 Range (min … max):  1.127 ms …   5.482 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.576 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.638 ms ± 313.121 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▁▂▄▅▄▇█▆▇▆▄▃▁                                       
  ▁▁▂▃▄▅▆▆▇███████████████▆▆▅▄▃▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▂▁▂▂▁▁▁ ▃
  1.13 ms         Histogram: frequency by time        2.68 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

PyTorch compiled

BenchmarkTools.Trial: 2764 samples with 1 evaluation per sample.
 Range (min … max):  1.327 ms …   5.663 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.784 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.807 ms ± 276.709 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

         ▁▄▆▆▅▅▆▂▃▂▄▄▆▅█▆▅▃▄                                   
  ▂▂▂▃▅▆██████████████████████▇▆▅▄▄▃▄▃▃▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁ ▄
  1.33 ms         Histogram: frequency by time        2.77 ms <

 Memory estimate: 16 bytes, allocs estimate: 1.

ArrayDiff

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  19.350 μs … 72.212 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     20.882 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.576 μs ±  9.600 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██▆▅▅▆▆▄▃▂▁                              ▆▄▃       ▁▅       ▂
  █████████████▆▅▄▆▅▆▁▃▁▁▄▅▁▃▃▃▁▄▁▁▁▁▁▃▁▁▄▁█████████████▇▆▆▆█ █
  19.4 μs      Histogram: log(frequency) by time      51.6 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

GPU

Lux

BenchmarkTools.Trial: 8813 samples with 1 evaluation per sample.
 Range (min … max):  354.278 μs …  16.126 ms  ┊ GC (min … max):  0.00% … 78.95%
 Time  (median):     440.657 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   566.273 μs ± 806.426 μs  ┊ GC (mean ± σ):  14.34% ±  9.60%

  █▂                                                             
  ██▆▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂ ▂
  354 μs           Histogram: frequency by time         5.41 ms <

 Memory estimate: 42.08 KiB, allocs estimate: 1159.

Hand-CUDA without prealloc

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  148.831 μs …  13.230 ms  ┊ GC (min … max):  0.00% … 68.56%
 Time  (median):     170.714 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   247.967 μs ± 534.493 μs  ┊ GC (mean ± σ):  14.59% ±  6.93%

  █▇▇▇▅▄▃▂▂▃▄▅▅▄▄▃▄▄▃▃▂▂▂                                       ▂
  ███████████████████████████▇▇▇▆▆▇▆▇▆▇▇█▇██▇▇████▇▇▇▇▅▆▅▄▄▅▆▄▄ █
  149 μs        Histogram: log(frequency) by time        497 μs <

 Memory estimate: 13.34 KiB, allocs estimate: 428.

Hand-CUDA with prealloc

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  151.882 μs …   9.050 ms  ┊ GC (min … max):  0.00% … 74.51%
 Time  (median):     177.549 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   239.107 μs ± 450.663 μs  ┊ GC (mean ± σ):  11.38% ±  5.96%

  ▄▇█▇▆▄▄▄▅▆▅▄▃▃▃▃▃▃▃▃▂▂▂▁▁▁                                    ▂
  ███████████████████████████▇▇▆▆▆▇█▆▇█▇███████▇▇███▇▇▆▅▆▅▅▂▄▄▅ █
  152 μs        Histogram: log(frequency) by time        484 μs <

 Memory estimate: 12.44 KiB, allocs estimate: 394.

PyTorch eager

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  145.126 μs …   1.370 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     254.887 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   325.177 μs ± 187.746 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄     ▄█▄▂                                                     
  █▅▂▃▆█████▇▅▃▃▂▂▂▁▂▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▂▁ ▂
  145 μs           Histogram: frequency by time          955 μs <

 Memory estimate: 160 bytes, allocs estimate: 8.

PyTorch compiled

BenchmarkTools.Trial: 8121 samples with 1 evaluation per sample.
 Range (min … max):  210.654 μs …   2.117 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     474.962 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   614.739 μs ± 366.681 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █     ▂                                                        
  █▃▁▂▃▆█▇▅▄▂▂▂▂▁▂▂▂▂▃▃▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁ ▂
  211 μs           Histogram: frequency by time         1.44 ms <

 Memory estimate: 160 bytes, allocs estimate: 8.

ArrayDiff

BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
 Range (min … max):  52.242 μs … 616.147 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     58.725 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   59.817 μs ±   6.709 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

               ▄██▅▄▂▄▄▃                                        
  ▁▁▁▁▁▁▁▂▄▅▆▇███████████▆▅▄▄▃▃▃▃▄▄▄▄▄▃▃▃▃▃▃▃▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  52.2 μs         Histogram: frequency by time         72.5 μs <

 Memory estimate: 4.55 KiB, allocs estimate: 162.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.20%. Comparing base (5b4d9ab) to head (9f28d88).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #56   +/-   ##
=======================================
  Coverage   90.20%   90.20%           
=======================================
  Files          23       23           
  Lines        2848     2848           
=======================================
  Hits         2569     2569           
  Misses        279      279           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant