Conversation
mdboom
left a comment
There was a problem hiding this comment.
I'm marking this as "approve" even though I have some questions inline and since I think it's totally fine to merge this and iterate if that's the easiest way forward.
(I am not a regular pixi user...) I tried to follow the instructions but I get:
pixi run -e source bench
Error: × failed to solve requirements of environment 'source' for platform 'linux-64'
├─▶ × failed to solve the environment
│
╰─▶ Cannot solve the request because of: cuda-bindings * cannot be installed because there are no viable options:
└─ cuda-bindings 13.1.0 would require
└─ cuda-nvrtc >=13.2.51,<14.0a0, which cannot be installed because there are no viable options:
└─ cuda-nvrtc 13.2.51 would require
└─ cuda-version >=13.2,<13.3.0a0, for which no candidates were found.
pixi run -e wheel bench
Error: × failed to solve requirements of environment 'source' for platform 'linux-64'
├─▶ × failed to solve the environment
│
╰─▶ Cannot solve the request because of: cuda-bindings * cannot be installed because there are no viable options:
└─ cuda-bindings 13.1.0 would require
└─ cuda-nvrtc >=13.2.51,<14.0a0, which cannot be installed because there are no viable options:
└─ cuda-nvrtc 13.2.51 would require
└─ cuda-version >=13.2,<13.3.0a0, for which no candidates were found.
| @@ -0,0 +1,17 @@ | |||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
There was a problem hiding this comment.
Maybe move this file into a benchmarks subdirectory so as not to clutter the top-level of the cuda_bindings/benchmarks directory.
| message(FATAL_ERROR "Could not find libcuda. Ensure the NVIDIA driver is installed.") | ||
| endif() | ||
|
|
||
| add_executable(bench_pointer_attributes_cpp bench_pointer_attributes.cpp) |
There was a problem hiding this comment.
Did you forget to commit bench_pointer_attributes.cpp?
| def time_func(loops: int) -> float: | ||
| t0 = time.perf_counter() | ||
| for _ in range(loops): | ||
| fn() |
There was a problem hiding this comment.
I appreciate the decorator approach here, but this means we will be measuring the overhead of this Python function call, in addition to the actual cuda_bindings function call we are measuring.
Even though it's less convenient, I think we need to manually inline this timing benchmark into the function itself and not use this wrapper in order to get accurate timings.
|
|
||
| - `bench`: Runs the Python benchmarks | ||
| - `bench-cpp`: Runs the C++ benchmarks | ||
|
|
|
|
||
|
|
||
| def bench_pointer_get_attribute() -> None: | ||
| err, _ = cuda.cuPointerGetAttribute(ATTRIBUTE, PTR) |
There was a problem hiding this comment.
When this is refactored to do its own timing measurement, the PTR and ATTRIBUTE vars should also be moved here (but outside of the loop) so the Python compiler will use fast local variable lookups rather than global lookups.
|
Thanks for the comments! I dont think we need to merge now. I'll address the comments and once we are happy with a template we have here we can commit and then in another PR i can just add more benchmarks. |
Description
closes #1580
Description
closes #1580
@leofang @mdboom I migrated one benchmark from the pytest suite to use pyperf and added a C++ equivalent.
bench_*.pyfiles withbench_*()functionsbench_time_funcpyperf stats/pyperf histcommands.The benchmark is
cuPointerGetAttribute, both Python and C++ call the same driver API with error checking.These are one set of results for Python and C++ in my system, so we are ok under the <1us. They dont run the same warmup and runs for each, i still need to finish that but just to give you an idea.
I still need to work on matching params for all the benchmarks and so on and so on but wanted to get feedback first if this looks fine to keep going.
Checklist