Skip to content

[BUG] bug in benchmark_mesh.py for certain block_dims #6

@jamesETsmith

Description

@jamesETsmith

Bug Description

Summary

When doing some benchmarking, I noticed that the benchmark_mesh.py script was getting incorrect results for some of the parameters (BLOCK_DIM=128,256)

Details

~/warp# python warp/examples/benchmarks/benchmark_mesh.py
Warp 1.12.0.dev0 initialized:
   ROCm 7.1, HIP driver 7.1
   Devices:
     "cpu"      : "x86_64"
     "cuda:0"   : "AMD Instinct MI300X" (192 GiB, gfx942:sramecc+:xnack-, mempool enabled)
   Kernel cache:
     /root/.cache/warp/1.12.0.dev0
Creating mesh with 100x100 grid (20000 triangles)...

Benchmark Configuration:
  Num Queries: 128
  Num Triangles: 20000
  Block Dims (tiled): [32, 64, 128, 256]
  Total threads (single): 128
  Iterations: 100
  Warm-up: 5

Benchmarking single-threaded AABB queries...
Warp warning: HIPRTC does not support precompiled headers, ignoring request
Module __main__ 27c8a51 load on device 'cuda:0' took 1114.37 ms  (compiled)

================================================================================
Testing with BLOCK_DIM = 32 (Total threads: 4096)
================================================================================
  Benchmarking tiled AABB queries...
Warp warning: HIPRTC does not support precompiled headers, ignoring request
Module __main__ 08f53b6 load on device 'cuda:0' took 1079.19 ms  (compiled)
  ✓ AABB results verified (avg hits: 235.8)

================================================================================
Testing with BLOCK_DIM = 64 (Total threads: 8192)
================================================================================
  Benchmarking tiled AABB queries...
Warp warning: HIPRTC does not support precompiled headers, ignoring request
Module __main__ 26c3a91 load on device 'cuda:0' took 1090.62 ms  (compiled)
  ✓ AABB results verified (avg hits: 235.8)

================================================================================
Testing with BLOCK_DIM = 128 (Total threads: 16384)
================================================================================
  Benchmarking tiled AABB queries...
Warp warning: HIPRTC does not support precompiled headers, ignoring request
Module __main__ de3d2cd load on device 'cuda:0' took 1104.22 ms  (compiled)
  WARNING: Results don't match for AABB queries!
    Single hits: [360 120 112  28 252 140  96 252 126  84  24 170  28  56 180 680 510  40
 448 120 180  52  72  30 400 336 280 420  88  88  84 338 100  32 120 360
 196 240 176 722 408 396  72  96 224 648 104 320 238 240  10  40 504 168
  54 200 152  70 264 216 392  36 288 612 520 176  54 112   8 140  14 560
 374 640 108 260 882 512  16 462 360  72 646  32 448 576  24  38  30 532
 288 182 228  32 756  72 630 360  64  10 224 640 266 266 416  84 152 408
 200 510 136  84 198 168  20 154  54 150 180 144 242 306 112 308 266 352
  16  20]
    Tiled hits:  [ 9472   120   112    28  8192   256    96  5120   126    84    24   256
    28    56   256 23424 15488    40 14592   120   256    52    72    30
 12288  9472  8960 13056    88    88    84 12288   100    32   120  9984
  4608  6912   256 25728 13312 12544    72    96   256 21888   104  9984
  8064  7424    10    40 15232   256    54   256   256    70  7424   256
 12032    36  7424 19840 17280  3200    54   112     8   256    14 17792
 11264 20864   108   384   896 14208    16 13184  8448    72 22144    32
 11648 19328    24    38    30 15488  8704   256  4864    32   384    72
 16768 10496    64    10   256 19584  8704  8448 12544    84   256 10624
   256 16000   256    84  3584   256    20   256    54   256   256   256
   256  9728   112  8448  6656 12032    16    20]

================================================================================
Testing with BLOCK_DIM = 256 (Total threads: 32768)
================================================================================
  Benchmarking tiled AABB queries...
  WARNING: Results don't match for AABB queries!
    Single hits: [360 120 112  28 252 140  96 252 126  84  24 170  28  56 180 680 510  40
 448 120 180  52  72  30 400 336 280 420  88  88  84 338 100  32 120 360
 196 240 176 722 408 396  72  96 224 648 104 320 238 240  10  40 504 168
  54 200 152  70 264 216 392  36 288 612 520 176  54 112   8 140  14 560
 374 640 108 260 882 512  16 462 360  72 646  32 448 576  24  38  30 532
 288 182 228  32 756  72 630 360  64  10 224 640 266 266 416  84 152 408
 200 510 136  84 198 168  20 154  54 150 180 144 242 306 112 308 266 352
  16  20]
    Tiled hits:  [  512   120   112    28   252   140    96   252   126    84    24   170
    28    56   180 19968 16384    40   512   120   180    52    72    30
   512   512   512   512    88    88    84   512   100    32   120   512
   196   240   176 23552   512   512    72    96   224 19968   104   512
   238   240    10    40 15360   168    54   200   152    70   512   216
   512    36   512 20992 16896   176    54   112     8   140    14 18944
   512 22016   108   512 30208 16384    16 13312   512    72 20992    32
   512 19968    24    38    30 14848   512   182   228    32 25088    72
 19456   512    64    10   224 20992   512   512   512    84   152   512
   200 16384   136    84   198   168    20   154    54   150   180   144
   242   512   112   512   512   512    16    20]

====================================================================================================
Query Type      Method               Time (ms)            Speedup         Threads
====================================================================================================
AABB            Single               2.88306±0.38         1.00x           128
AABB            Tiled (BD=32)        0.13511±0.018        21.34x          4096
AABB            Tiled (BD=64)        0.122542±0.019       23.53x          8192
AABB            Tiled (BD=128)       1.52423±0.0066       1.89x           16384
AABB            Tiled (BD=256)       1.46628±0.0046       1.97x           32768
====================================================================================================

System Information

Environment

Python 3.12.13
rocm 7.10.0a20251029

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions