Skip to content

adds diamond tiling test#584

Merged
lukastruemper merged 1 commit intomainfrom
diamond-tiling
Apr 3, 2026
Merged

adds diamond tiling test#584
lukastruemper merged 1 commit intomainfrom
diamond-tiling

Conversation

@lukastruemper
Copy link
Copy Markdown
Contributor

No description provided.

NoraHagmeyer
NoraHagmeyer previously approved these changes Mar 18, 2026
@NoraHagmeyer NoraHagmeyer marked this pull request as draft March 18, 2026 14:15
@NoraHagmeyer NoraHagmeyer marked this pull request as ready for review March 18, 2026 14:15
@daisytuner
Copy link
Copy Markdown

daisytuner Bot commented Mar 18, 2026

Daisytuner Report - mlir_torch (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# linear_torch           6.17 s      N/A         N/A         1492.30 J   N/A         
# linear_none            15.80 s     N/A         N/A         3942.12 J   N/A         
# linear_sequential      15.80 s     N/A         N/A         3863.74 J   N/A         
# linear_openmp          15.76 s     N/A         N/A         3856.71 J   N/A         
# linear_cuda            13.42 s     N/A         N/A         2578.72 J   N/A         
# matmul_torch           5.98 s      N/A         N/A         1442.44 J   N/A         
# matmul_none            10.65 s     N/A         N/A         2826.05 J   N/A         
# matmul_sequential      10.44 s     N/A         N/A         2809.04 J   N/A         
# matmul_openmp          10.56 s     N/A         N/A         2840.21 J   N/A         
# matmul_cuda            10.23 s     N/A         N/A         1963.24 J   N/A         

@daisytuner
Copy link
Copy Markdown

daisytuner Bot commented Mar 18, 2026

Daisytuner Report - python_npbench (zinnia)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# adi_numpy              1.32 s      -0.31%      N/A         131.88 J    -0.34%      
# adi_omp                15.05 s     -0.72%      N/A         1496.32 J   +0.39%      
# adi_cuda               4.73 s      +0.21%      N/A         458.46 J    +0.19%      
# adi_seq_tuning         15.98 s     -0.49%      N/A         1508.49 J   -0.49%      
# atax_numpy             2.16 s      -0.19%      N/A         223.56 J    -0.13%      
# atax_omp               2.97 s      -2.30%      N/A         374.23 J    -3.48%      
# atax_cuda              4.12 s      +0.20%      N/A         424.86 J    +0.23%      
# atax_seq_tuning        3.72 s      -0.14%      N/A         376.84 J    +0.03%      
# gemm_numpy             1.21 s      -0.62%      N/A         193.06 J    -0.60%      
# gemm_omp               1.11 s      +0.05%      N/A         162.52 J    +0.06%      
# gemm_cuda              10.58 s     -0.45%      N/A         1005.79 J   -0.49%      
# gemm_seq_tuning        1.11 s      -0.34%      N/A         161.67 J    -0.49%      
# gesummv_numpy          1.76 s      +0.38%      N/A         251.34 J    +0.31%      
# gesummv_omp            1.98 s      +1.31%      N/A         313.02 J    +0.24%      
# gesummv_cuda           8.31 s      -0.39%      N/A         1000.57 J   -0.18%      
# gesummv_seq_tuning     6.65 s      -0.16%      N/A         812.91 J    -0.12%      
# gemver_numpy           1.10 s      +0.71%      N/A         168.37 J    +0.92%      
# gemver_omp             848.23 ms   -1.56%      N/A         109.38 J    -2.50%      
# gemver_cuda            3.88 s      -0.79%      N/A         388.06 J    -0.79%      
# gemver_seq_tuning      4.51 s      +0.52%      N/A         436.81 J    +0.60%      
# k2mm_numpy             1.20 s      +0.22%      N/A         196.51 J    +0.33%      
# k2mm_omp               3.54 s      +0.10%      N/A         660.46 J    -1.27%      
# k2mm_cuda              13.59 s     +0.02%      N/A         1294.20 J   +0.40%      
# k2mm_seq_tuning        3.62 s      +0.14%      N/A         466.40 J    +0.03%      
# k3mm_numpy             1.03 s      -0.20%      N/A         181.63 J    -0.23%      
# k3mm_omp               5.55 s      -0.21%      N/A         949.26 J    -1.46%      
# k3mm_cuda              19.83 s     -0.41%      N/A         1870.93 J   -0.45%      
# k3mm_seq_tuning        5.73 s      -0.09%      N/A         792.18 J    -0.11%      
# mvt_numpy              2.42 s      -0.43%      N/A         248.54 J    -0.32%      
# mvt_omp                2.75 s      +0.18%      N/A         285.28 J    +0.13%      
# mvt_cuda               3.36 s      +0.39%      N/A         343.04 J    +0.34%      
# mvt_seq_tuning         2.75 s      -0.38%      N/A         285.40 J    -0.27%      
# symm_numpy             785.66 ms   -1.03%      N/A         81.05 J     -0.73%      
# symm_omp               6.01 s      -1.63%      N/A         589.82 J    -1.18%      
# symm_seq_tuning        8.50 s      +0.31%      N/A         810.07 J    +0.30%      
# syr2k_numpy            886.06 ms   -1.46%      N/A         90.00 J     -1.46%      
# syr2k_omp              9.89 s      -0.29%      N/A         940.62 J    -0.21%      
# syr2k_cuda             1.62 s      -1.06%      N/A         168.74 J    -1.06%      
# syr2k_seq_tuning       9.85 s      -0.17%      N/A         936.89 J    -0.15%      
# syrk_numpy             782.94 ms   +1.02%      N/A         80.44 J     +0.80%      
# syrk_omp               6.01 s      +0.30%      N/A         578.23 J    +0.29%      
# syrk_cuda              1.50 s      -1.65%      N/A         157.61 J    -1.44%      
# syrk_seq_tuning        5.98 s      +0.55%      N/A         574.92 J    +0.43%      
# trmm_numpy             876.78 ms   +0.01%      N/A         89.33 J     -0.30%      
# trmm_omp               728.67 ms   +3.51%      N/A         92.80 J     +3.97%      
# trmm_seq_tuning        3.42 s      +1.02%      N/A         324.73 J    -0.01%      

@daisytuner
Copy link
Copy Markdown

daisytuner Bot commented Apr 3, 2026

Daisytuner Report - mlir_torch_models (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# bn_conv_bn_relu_maxpool_torch18.59 s     -0.97%      N/A         3580.48 J   -2.46%      
# bn_conv_bn_relu_maxpool_run_none3.27 s      -0.06%      N/A         657.08 J    -1.12%      
# bn_conv_bn_relu_maxpool_run_sequential3.28 s      -0.52%      N/A         659.17 J    -2.19%      
# bn_conv_bn_relu_maxpool_run_openmp3.36 s      +1.50%      N/A         693.47 J    +2.09%      
# bn_conv_bn_relu_maxpool_run_cuda3.71 s      -0.25%      N/A         724.40 J    -1.64%      

@lukastruemper lukastruemper force-pushed the diamond-tiling branch 2 times, most recently from 0970ba1 to 6731da9 Compare April 3, 2026 17:24
@lukastruemper lukastruemper merged commit 951590c into main Apr 3, 2026
15 of 18 checks passed
@lukastruemper lukastruemper deleted the diamond-tiling branch April 3, 2026 18:17
@daisytuner
Copy link
Copy Markdown

daisytuner Bot commented Apr 3, 2026

Daisytuner Report - mlir_torch_layers (chamomile)

@@                                   Benchmarks                                   @@
=====================================================================================
  Benchmark              Time        ΔTime       Thr         Energy      ΔEnergy     
=====================================================================================
# batchnorm_torch        19.01 s     -1.56%      N/A         3724.81 J   +2.94%      
# batchnorm_run_none     6.22 s      -0.00%      N/A         1210.91 J   +4.77%      
# batchnorm_run_sequential6.65 s      +1.76%      N/A         1292.29 J   +6.43%      
# batchnorm_run_openmp   5.74 s      -1.09%      N/A         1353.36 J   +3.36%      
# batchnorm_run_cuda     8.05 s      -0.42%      N/A         1577.89 J   +4.76%      
# conv2d_torch           18.58 s     -0.05%      N/A         3643.22 J   +4.52%      
# conv2d_run_openmp      5.02 s      +0.56%      N/A         1216.13 J   +4.82%      
# conv2d_run_cuda        6.84 s      +0.28%      N/A         1342.97 J   +5.56%      
# linear_torch           6.13 s      +0.67%      N/A         1475.96 J   +3.84%      
# linear_run_none        11.53 s     -0.90%      N/A         3135.96 J   +1.24%      
# linear_run_sequential  9.99 s      +1.16%      N/A         2778.56 J   +4.71%      
# linear_run_openmp      9.73 s      -0.36%      N/A         2868.10 J   +2.66%      
# linear_run_cuda        9.20 s      +0.12%      N/A         1791.89 J   +5.03%      
# matmul_torch           6.12 s      +1.01%      N/A         1477.66 J   +4.05%      
# matmul_run_none        11.56 s     -0.38%      N/A         3147.59 J   +1.87%      
# matmul_run_sequential  9.87 s      -0.73%      N/A         2748.98 J   +2.79%      
# matmul_run_openmp      9.69 s      +0.16%      N/A         2857.66 J   +4.04%      
# matmul_run_cuda        9.21 s      +1.58%      N/A         1792.57 J   +6.27%      
# pooling_torch          26.14 s     +1.14%      N/A         5211.23 J   +5.78%      
# pooling_run_none       25.27 s     -0.25%      N/A         4849.27 J   +4.48%      
# pooling_run_sequential 25.68 s     -0.82%      N/A         4918.95 J   +3.71%      
# pooling_run_openmp     17.12 s     -1.17%      N/A         3657.86 J   +3.28%      
# pooling_run_cuda       31.15 s     -0.25%      N/A         6061.07 J   +4.67%      
# relu_torch             18.93 s     -0.07%      N/A         3716.56 J   +4.65%      
# relu_run_none          5.25 s      +0.22%      N/A         1021.44 J   +4.67%      
# relu_run_sequential    6.35 s      +0.31%      N/A         1232.58 J   +4.72%      
# relu_run_openmp        5.70 s      +0.31%      N/A         1344.55 J   +4.45%      
# relu_run_cuda          8.28 s      +0.10%      N/A         1625.79 J   +5.31%      

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants