~/warp# python warp/examples/benchmarks/benchmark_mesh.py
Warp 1.12.0.dev0 initialized:
ROCm 7.1, HIP driver 7.1
Devices:
"cpu" : "x86_64"
"cuda:0" : "AMD Instinct MI300X" (192 GiB, gfx942:sramecc+:xnack-, mempool enabled)
Kernel cache:
/root/.cache/warp/1.12.0.dev0
Creating mesh with 100x100 grid (20000 triangles)...
Benchmark Configuration:
Num Queries: 128
Num Triangles: 20000
Block Dims (tiled): [32, 64, 128, 256]
Total threads (single): 128
Iterations: 100
Warm-up: 5
Benchmarking single-threaded AABB queries...
Warp warning: HIPRTC does not support precompiled headers, ignoring request
Module __main__ 27c8a51 load on device 'cuda:0' took 1114.37 ms (compiled)
================================================================================
Testing with BLOCK_DIM = 32 (Total threads: 4096)
================================================================================
Benchmarking tiled AABB queries...
Warp warning: HIPRTC does not support precompiled headers, ignoring request
Module __main__ 08f53b6 load on device 'cuda:0' took 1079.19 ms (compiled)
✓ AABB results verified (avg hits: 235.8)
================================================================================
Testing with BLOCK_DIM = 64 (Total threads: 8192)
================================================================================
Benchmarking tiled AABB queries...
Warp warning: HIPRTC does not support precompiled headers, ignoring request
Module __main__ 26c3a91 load on device 'cuda:0' took 1090.62 ms (compiled)
✓ AABB results verified (avg hits: 235.8)
================================================================================
Testing with BLOCK_DIM = 128 (Total threads: 16384)
================================================================================
Benchmarking tiled AABB queries...
Warp warning: HIPRTC does not support precompiled headers, ignoring request
Module __main__ de3d2cd load on device 'cuda:0' took 1104.22 ms (compiled)
WARNING: Results don't match for AABB queries!
Single hits: [360 120 112 28 252 140 96 252 126 84 24 170 28 56 180 680 510 40
448 120 180 52 72 30 400 336 280 420 88 88 84 338 100 32 120 360
196 240 176 722 408 396 72 96 224 648 104 320 238 240 10 40 504 168
54 200 152 70 264 216 392 36 288 612 520 176 54 112 8 140 14 560
374 640 108 260 882 512 16 462 360 72 646 32 448 576 24 38 30 532
288 182 228 32 756 72 630 360 64 10 224 640 266 266 416 84 152 408
200 510 136 84 198 168 20 154 54 150 180 144 242 306 112 308 266 352
16 20]
Tiled hits: [ 9472 120 112 28 8192 256 96 5120 126 84 24 256
28 56 256 23424 15488 40 14592 120 256 52 72 30
12288 9472 8960 13056 88 88 84 12288 100 32 120 9984
4608 6912 256 25728 13312 12544 72 96 256 21888 104 9984
8064 7424 10 40 15232 256 54 256 256 70 7424 256
12032 36 7424 19840 17280 3200 54 112 8 256 14 17792
11264 20864 108 384 896 14208 16 13184 8448 72 22144 32
11648 19328 24 38 30 15488 8704 256 4864 32 384 72
16768 10496 64 10 256 19584 8704 8448 12544 84 256 10624
256 16000 256 84 3584 256 20 256 54 256 256 256
256 9728 112 8448 6656 12032 16 20]
================================================================================
Testing with BLOCK_DIM = 256 (Total threads: 32768)
================================================================================
Benchmarking tiled AABB queries...
WARNING: Results don't match for AABB queries!
Single hits: [360 120 112 28 252 140 96 252 126 84 24 170 28 56 180 680 510 40
448 120 180 52 72 30 400 336 280 420 88 88 84 338 100 32 120 360
196 240 176 722 408 396 72 96 224 648 104 320 238 240 10 40 504 168
54 200 152 70 264 216 392 36 288 612 520 176 54 112 8 140 14 560
374 640 108 260 882 512 16 462 360 72 646 32 448 576 24 38 30 532
288 182 228 32 756 72 630 360 64 10 224 640 266 266 416 84 152 408
200 510 136 84 198 168 20 154 54 150 180 144 242 306 112 308 266 352
16 20]
Tiled hits: [ 512 120 112 28 252 140 96 252 126 84 24 170
28 56 180 19968 16384 40 512 120 180 52 72 30
512 512 512 512 88 88 84 512 100 32 120 512
196 240 176 23552 512 512 72 96 224 19968 104 512
238 240 10 40 15360 168 54 200 152 70 512 216
512 36 512 20992 16896 176 54 112 8 140 14 18944
512 22016 108 512 30208 16384 16 13312 512 72 20992 32
512 19968 24 38 30 14848 512 182 228 32 25088 72
19456 512 64 10 224 20992 512 512 512 84 152 512
200 16384 136 84 198 168 20 154 54 150 180 144
242 512 112 512 512 512 16 20]
====================================================================================================
Query Type Method Time (ms) Speedup Threads
====================================================================================================
AABB Single 2.88306±0.38 1.00x 128
AABB Tiled (BD=32) 0.13511±0.018 21.34x 4096
AABB Tiled (BD=64) 0.122542±0.019 23.53x 8192
AABB Tiled (BD=128) 1.52423±0.0066 1.89x 16384
AABB Tiled (BD=256) 1.46628±0.0046 1.97x 32768
====================================================================================================
Python 3.12.13
rocm 7.10.0a20251029
Bug Description
Summary
When doing some benchmarking, I noticed that the
benchmark_mesh.pyscript was getting incorrect results for some of the parameters (BLOCK_DIM=128,256)Details
System Information
Environment