Conversation
There was a problem hiding this comment.
Pull Request Overview
Introduces enhancements to profiling and annotation APIs, including improved kernel-event association, a new breakdown function for annotation timings, and extended flat profiling with custom grouping and parallelism controls.
- Expanded
flat_profilemethod withmapper,parallelism_level,ascending, andidle_timeoptions - Added
time_breakdownto compute CPU and launched-kernel time per annotation - Updated SQLite reader to disable unused CUPTI events and switch from
_childrento_kernel_launch
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pipit/trace.py | Enhanced flat_profile, added time_breakdown, removed dropna=False, and debug print |
| pipit/readers/nsight_sqlite_reader.py | Commented out MEMCPY/MEMSET/SYNCH SQL, and replaced _children assignments with _kernel_launch |
Comments suppressed due to low confidence (4)
pipit/readers/nsight_sqlite_reader.py:293
- Swapping
_childrenfor_kernel_launchbreaks consumers (e.g.,time_breakdownandfilter_by_label) that expect_children. Ensure both fields are populated or update all references.
trace_df.loc[calls_that_launch["index_x"].to_numpy(), "_kernel_launch"] = (calls_that_launch["index_y"].to_numpy())
pipit/trace.py:1040
- [nitpick] Using
DataFrame.applywith Python loops for each kernel may be slow on large traces. Consider vectorized operations or grouping strategies to improve performance.
kernels.apply(_calc_kernel_time, axis=1,)
pipit/readers/nsight_sqlite_reader.py:71
- [nitpick] Large commented-out SQL queries add clutter. If these events are unused long-term, remove the dead code or move it to documentation.
# """ ... large commented SQL block ... """
pipit/trace.py:528
- New parameters and logic in
flat_profileandtime_breakdownlack dedicated unit tests. Add tests covering custommapper, parallelism levels,idle_time, sorting, and the new breakdown method.
def flat_profile(..., mapper=None, parallelism_level=None, ascending=None, idle_time=False):
| pd_grouper[label] = group | ||
| res = ( | ||
| res.set_index("Name") | ||
| .groupby([pd_grouper] + parallelism_level)[["time.exc"]] |
There was a problem hiding this comment.
The mapper branch hard-codes time.exc instead of using the metrics parameter; this will ignore any other metrics list provided. Use the metrics variable here.
| .groupby([pd_grouper] + parallelism_level)[["time.exc"]] | |
| .groupby([pd_grouper] + parallelism_level)[metrics] |
There was a problem hiding this comment.
I'll agree with this, worth making sure this function works when other metrics than time.exc are passed in.
jhdavis8
left a comment
There was a problem hiding this comment.
Features and documentation looks good overall. I added a few comments.
One high-level question, are there new unit tests checking the added features? Would be good to add these here or in a later PR.
| pd_grouper[label] = group | ||
| res = ( | ||
| res.set_index("Name") | ||
| .groupby([pd_grouper] + parallelism_level)[["time.exc"]] |
There was a problem hiding this comment.
I'll agree with this, worth making sure this function works when other metrics than time.exc are passed in.
|
|
||
| # Use explode to expand every child in children list to a row | ||
| # This can include duplicates (e.g. for nested annotations) that we should drop | ||
| kernels = events.loc[host_events["_children"].dropna().explode().to_numpy()] |
| # Postprocessing using mapper | ||
| if mapper is not None: | ||
| # pandas expects label->group | ||
| labels = res["Name"] |
There was a problem hiding this comment.
Should this be groupby_column? I'm not sure but wanted to double-check. I think we should check whether we want to use the hardcoded "Name" column or groupby_column in every line we use "Name".
|
Changes