python target override for expand phase & combined sdfg pipeline code… by ramonwirsch · Pull Request #595 · daisytuner/docc

ramonwirsch · 2026-03-17T17:30:43Z

… of mlir and python frontends

moved DOCC_CI handling, to live in shared code and now also apply to mlir frontend

… of mlir and python frontends + moved DOCC_CI handling, to live in shared code and now also apply to mlir frontend

daisytuner · 2026-03-17T18:06:14Z

Daisytuner Report - mlir_torch (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# linear_torch           6.16 s      -0.36%      N/A         1463.10 J   +1.26%      
# linear_none            15.89 s     -0.27%      N/A         3957.20 J   +2.33%      
# linear_sequential      15.84 s     +0.33%      N/A         3837.01 J   +3.16%      
# linear_openmp          15.79 s     +0.51%      N/A         3827.20 J   +3.39%      
# linear_cuda            13.58 s     +0.80%      N/A         2569.93 J   +3.83%      
# matmul_torch           6.09 s      -1.68%      N/A         1446.92 J   +0.33%      
# matmul_none            10.72 s     +0.46%      N/A         2829.17 J   +2.03%      
# matmul_sequential      10.58 s     +0.31%      N/A         2847.36 J   +3.17%      
# matmul_openmp          10.52 s     +0.17%      N/A         2822.97 J   +2.94%      
# matmul_cuda            10.23 s     +0.37%      N/A         1944.93 J   +4.20%

daisytuner · 2026-03-17T19:15:46Z

Daisytuner Report - python_npbench (zinnia)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# adi_numpy              1.31 s      -0.28%      N/A         130.52 J    -0.37%      
# adi_omp                15.93 s     -0.57%      N/A         1503.83 J   -0.66%      
# adi_cuda               4.77 s      -0.33%      N/A         463.25 J    -0.13%      
# adi_seq_tuning         15.98 s     -0.24%      N/A         1507.69 J   -0.37%      
# atax_numpy             2.17 s      +0.61%      N/A         225.78 J    +1.09%      
# atax_omp               2.46 s      -0.48%      N/A         258.88 J    -0.45%      
# atax_cuda              4.13 s      +0.23%      N/A         425.48 J    +0.32%      
# atax_seq_tuning        3.71 s      -0.67%      N/A         375.20 J    -0.66%      
# gemm_numpy             1.23 s      -0.65%      N/A         198.51 J    -0.97%      
# gemm_omp               1.11 s      -0.34%      N/A         162.31 J    -0.44%      
# gemm_cuda              10.58 s     -0.47%      N/A         1005.41 J   -0.45%      
# gemm_seq_tuning        1.12 s      -0.19%      N/A         161.87 J    -0.14%      
# gesummv_numpy          1.73 s      -1.68%      N/A         247.47 J    -1.62%      
# gesummv_omp            5.29 s      -0.92%      N/A         686.29 J    -1.04%      
# gesummv_cuda           8.27 s      -1.09%      N/A         992.76 J    -0.84%      
# gesummv_seq_tuning     6.53 s      -0.90%      N/A         801.03 J    -0.85%      
# gemver_numpy           1.08 s      -0.39%      N/A         165.87 J    -0.56%      
# gemver_omp             712.31 ms   -0.15%      N/A         81.36 J     -0.32%      
# gemver_cuda            3.88 s      -0.03%      N/A         388.41 J    -0.06%      
# gemver_seq_tuning      4.46 s      +0.37%      N/A         431.59 J    +0.32%      
# k2mm_numpy             1.20 s      -0.49%      N/A         197.30 J    -0.52%      
# k2mm_omp               3.61 s      -0.82%      N/A         467.49 J    -0.54%      
# k2mm_cuda              13.54 s     -0.50%      N/A         1280.93 J   -0.57%      
# k2mm_seq_tuning        3.60 s      -0.19%      N/A         463.96 J    -0.42%      
# k3mm_numpy             1.03 s      -0.42%      N/A         183.86 J    -0.57%      
# k3mm_omp               5.73 s      -0.14%      N/A         794.64 J    -0.29%      
# k3mm_cuda              19.81 s     -0.34%      N/A         1864.61 J   -0.54%      
# k3mm_seq_tuning        5.72 s      -0.17%      N/A         791.24 J    -0.44%      
# mvt_numpy              2.42 s      -0.32%      N/A         247.56 J    -0.54%      
# mvt_omp                2.74 s      -0.01%      N/A         284.58 J    -0.04%      
# mvt_cuda               3.36 s      +0.06%      N/A         342.32 J    -0.16%      
# mvt_seq_tuning         2.74 s      -0.05%      N/A         284.54 J    -0.12%      
# symm_numpy             785.92 ms   -0.03%      N/A         80.92 J     -0.12%      
# symm_omp               8.41 s      +0.07%      N/A         801.59 J    +0.01%      
# symm_seq_tuning        8.41 s      +0.02%      N/A         800.97 J    -0.06%      
# syr2k_numpy            891.15 ms   -0.40%      N/A         90.56 J     -0.36%      
# syr2k_omp              9.85 s      -0.08%      N/A         936.46 J    -0.05%      
# syr2k_cuda             1.65 s      -0.89%      N/A         170.78 J    -0.84%      
# syr2k_seq_tuning       9.81 s      -0.22%      N/A         932.93 J    -0.20%      
# syrk_numpy             772.36 ms   -1.61%      N/A         79.57 J     -1.27%      
# syrk_omp               5.93 s      -0.05%      N/A         570.55 J    -0.04%      
# syrk_cuda              1.52 s      -1.04%      N/A         158.54 J    -1.07%      
# syrk_seq_tuning        5.91 s      -0.93%      N/A         567.95 J    -0.96%      
# trmm_numpy             878.71 ms   -1.00%      N/A         89.45 J     -1.01%      
# trmm_omp               3.10 s      -0.88%      N/A         306.26 J    -0.92%      
# trmm_seq_tuning        3.39 s      -2.02%      N/A         322.89 J    -1.45%

~ each test case must set its own global options (register_target..., set_backend_options) + fixtures to cleanup global state after every function, to prevent us from accidentally relying on it + force_rebuild option on torch_compile to prevent reload from file cache for tests were we want to see the actual compile process

python target override for expand phase & combined sdfg pipeline code…

c05e384

… of mlir and python frontends + moved DOCC_CI handling, to live in shared code and now also apply to mlir frontend

ramonwirsch enabled auto-merge March 17, 2026 17:34

ramonwirsch requested review from Moehre2 and NoraHagmeyer March 18, 2026 16:06

NoraHagmeyer approved these changes Mar 19, 2026

View reviewed changes

ramonwirsch merged commit 5fb101d into main Mar 19, 2026
20 checks passed

ramonwirsch deleted the expand-override-hook branch March 19, 2026 07:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python target override for expand phase & combined sdfg pipeline code…#595

python target override for expand phase & combined sdfg pipeline code…#595
ramonwirsch merged 2 commits intomainfrom
expand-override-hook

ramonwirsch commented Mar 17, 2026

Uh oh!

daisytuner Bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

daisytuner Bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ramonwirsch commented Mar 17, 2026

Uh oh!

daisytuner Bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Daisytuner Report - mlir_torch (chamomile)

Uh oh!

daisytuner Bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Daisytuner Report - python_npbench (zinnia)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daisytuner Bot commented Mar 17, 2026 •

edited

Loading

daisytuner Bot commented Mar 17, 2026 •

edited

Loading