Add speedscope export timestamp fallback#899
Add speedscope export timestamp fallback#899pablogsal wants to merge 5 commits intobloomberg:mainfrom
Conversation
2118cc9 to
1daa4a2
Compare
|
See an example scope for the sqlite example by using › https://github.com/user-attachments/files/26490542/memray-speedscope-example.py.201148.speedscope.json in https://www.speedscope.app |
Record monotonic timestamps for allocation and deallocation events in all-allocation capture files, surface the data through the reader metadata and Python API, and add a CLI flag to enable the feature. The CLI rejects --allocation-timestamps with aggregated captures since that format does not preserve per-event ordering. Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>
Teach the transform reporter to emit sampled speedscope profiles for bytes and allocation counts, preserving root-to-leaf stack order and aggregating identical stacks. When captures include exact allocation timestamps, order the sampled stacks by the earliest contributing event so the exported profile tracks the original execution order. Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>
Keep the normal transform flow for existing formats, but route speedscope exports through temporal allocation records when a capture lacks exact per-allocation timestamps. The reporter uses snapshot times to order surviving intervals for leak reports and the intervals contributing to the peak snapshot for high-water-mark reports. Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>
1daa4a2 to
31e477b
Compare
Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>
Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #899 +/- ##
==========================================
+ Coverage 92.46% 92.51% +0.04%
==========================================
Files 99 99
Lines 11726 12110 +384
Branches 426 461 +35
==========================================
+ Hits 10843 11203 +360
- Misses 883 907 +24
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
godlygeek
left a comment
There was a problem hiding this comment.
I'm still looking through this one. I've reviewed the changes to add the allocation timestamp to every allocation record, I haven't yet really reviewed the new report. I've got a few comments on the code from the first commit.
But mostly, I'm a bit confused by why we need the per-allocation timestamp. What is it that this gives us beyond what we can already get from our temporal format? Is it just to get finer granularity rather than sticking us with the default 10ms granularity?
| if (d_header.has_allocation_timestamps) { | ||
| if (!writeVarint(record.timestamp_us - d_last.allocation_timestamp_us)) { | ||
| return false; | ||
| } | ||
| d_last.allocation_timestamp_us = record.timestamp_us; | ||
| } |
There was a problem hiding this comment.
Why use writeVarint directly instead of writeIntegralDelta? Is it just the mismatch between signed and unsigned? If so, I think we should just use a signed timestamp instead. Sure, we're using a monotonic clock and so the delta will never be negative, but that just cuts our range in half.
Looks like the range is still almost 300 millenia even if we cut it in half:
>>> 2**63 / 1000000 / 60 / 60 / 24 / 365
292471.208677536so we probably don't need to care about discarding that last bit, and it makes our lives easier to be able to reuse the existing delta helpers.
| if (!writeSimpleType(token) | ||
| || (pointer_cache_index == -1 && !writeIntegralDelta(&d_last.data_pointer, record.address >> 3)) | ||
| || (allocator_id >= 8 && !writeSimpleType(record.allocator)) | ||
| || (d_header.native_traces | ||
| && hooks::allocatorKind(record.allocator) != hooks::AllocatorKind::SIMPLE_DEALLOCATOR | ||
| && !writeIntegralDelta(&d_last.native_frame_id, record.native_frame_id)) | ||
| || (hooks::allocatorKind(record.allocator) != hooks::AllocatorKind::SIMPLE_DEALLOCATOR | ||
| && !writeVarint(record.size))) | ||
| { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
To keep the diff smaller, I'd prefer:
| if (!writeSimpleType(token) | |
| || (pointer_cache_index == -1 && !writeIntegralDelta(&d_last.data_pointer, record.address >> 3)) | |
| || (allocator_id >= 8 && !writeSimpleType(record.allocator)) | |
| || (d_header.native_traces | |
| && hooks::allocatorKind(record.allocator) != hooks::AllocatorKind::SIMPLE_DEALLOCATOR | |
| && !writeIntegralDelta(&d_last.native_frame_id, record.native_frame_id)) | |
| || (hooks::allocatorKind(record.allocator) != hooks::AllocatorKind::SIMPLE_DEALLOCATOR | |
| && !writeVarint(record.size))) | |
| { | |
| return false; | |
| } | |
| bool success = writeSimpleType(token) | |
| && (pointer_cache_index != -1 | |
| || writeIntegralDelta(&d_last.data_pointer, record.address >> 3)) | |
| && (allocator_id < 8 || writeSimpleType(record.allocator)) | |
| && (!d_header.native_traces | |
| || hooks::allocatorKind(record.allocator) == hooks::AllocatorKind::SIMPLE_DEALLOCATOR | |
| || writeIntegralDelta(&d_last.native_frame_id, record.native_frame_id)) | |
| && (hooks::allocatorKind(record.allocator) == hooks::AllocatorKind::SIMPLE_DEALLOCATOR | |
| || writeVarint(record.size)); | |
| if (!success) { | |
| return false; | |
| } |
| assert reader.metadata.has_allocation_timestamps is True | ||
|
|
||
| assert len(records) == 2 | ||
| assert records[0].timestamp_us > 0 |
There was a problem hiding this comment.
Why greater than zero? This is a timestamp since the tracking session started, so it could technically be zero since there's no sleeps, right? We might want to put in a time.sleep(0.01) before both the valloc and the free so that both of the assertions can be > instead of >=
| const bool d_allocation_timestamps; | ||
| linker::SymbolPatcher d_patcher; | ||
| std::unique_ptr<BackgroundThread> d_background_thread; | ||
| const std::chrono::steady_clock::time_point d_monotonic_start{std::chrono::steady_clock::now()}; |
There was a problem hiding this comment.
Is there any way that we can synchronize this to refer to the exact same moment in time as the start_time in the metadata? I think the answer is "no", but if we could it would be very nice. As it is, we can sort of say what wall clock time an event happened at, but not exactly - even setting aside clock adjustments due to NTP or whatever, or DST transitions, we'll always be off by some fixed number of microseconds due to the delta between when we get the wall clock timestamp and when we get the monotonic clock timestamp.
Like I said, I don't think there's anything we can do about that, but if there was it'd be cool...
| bool track_object_lifetimes, | ||
| bool has_allocation_timestamps) | ||
| { | ||
| has_allocation_timestamps = file_format == FileFormat::ALL_ALLOCATIONS && has_allocation_timestamps; |
There was a problem hiding this comment.
There's no need for this, it's simpler to just do...
| has_allocation_timestamps = file_format == FileFormat::ALL_ALLOCATIONS && has_allocation_timestamps; |
| trace_python_allocators, | ||
| track_object_lifetimes); | ||
| track_object_lifetimes, | ||
| has_allocation_timestamps); |
There was a problem hiding this comment.
and...
| has_allocation_timestamps); | |
| false); |
But actually, since we're hardcoding it to false for AggregatingRecordWriter, why have it take this parameter at all?
Build the speedscope export around sampled stacks and make the ordering
source explicit instead of relying on the reader's iteration order.
When a capture has exact allocation timestamps, record them in the
non-aggregated file format, expose them through the reader metadata, and
sort each aggregated speedscope sample by the earliest allocation time
that contributed to it. This keeps the sampled profile aligned with the
real execution order without changing the existing high-water-mark and
leak aggregation logic.
When exact timestamps are not available, switch the speedscope path to
temporal allocation records and recover a stable approximation from the
memory snapshots we already store. Leak exports keep only intervals that
survive to the end of the run and order them by the first snapshot where
they appear. High-water-mark exports evaluate intervals at the peak
snapshot and order them by the first snapshot that contributes to that
peak.
Keep this behavior speedscope-specific and reject
--aggregatewith--allocation-timestamps, because aggregated captures discard the eventordering that the export depends on. The patch also bumps the file
format/header metadata and adds command, reader/writer, and reporter
tests for both the exact-timestamp and fallback paths.
Issue number of the reported bug or feature request: #
Describe your changes
A clear and concise description of the changes you have made.
Testing performed
Describe the testing you have performed to ensure that the bug has been addressed, or that the new feature works as planned.
Additional context
Add any other context about your contribution here.