Skip to content

Add speedscope export timestamp fallback#899

Open
pablogsal wants to merge 5 commits intobloomberg:mainfrom
pablogsal:speedscope-timestamps-fallback
Open

Add speedscope export timestamp fallback#899
pablogsal wants to merge 5 commits intobloomberg:mainfrom
pablogsal:speedscope-timestamps-fallback

Conversation

@pablogsal
Copy link
Copy Markdown
Collaborator

Build the speedscope export around sampled stacks and make the ordering
source explicit instead of relying on the reader's iteration order.

When a capture has exact allocation timestamps, record them in the
non-aggregated file format, expose them through the reader metadata, and
sort each aggregated speedscope sample by the earliest allocation time
that contributed to it. This keeps the sampled profile aligned with the
real execution order without changing the existing high-water-mark and
leak aggregation logic.

When exact timestamps are not available, switch the speedscope path to
temporal allocation records and recover a stable approximation from the
memory snapshots we already store. Leak exports keep only intervals that
survive to the end of the run and order them by the first snapshot where
they appear. High-water-mark exports evaluate intervals at the peak
snapshot and order them by the first snapshot that contributes to that
peak.

Keep this behavior speedscope-specific and reject --aggregate with
--allocation-timestamps, because aggregated captures discard the event
ordering that the export depends on. The patch also bumps the file
format/header metadata and adds command, reader/writer, and reporter
tests for both the exact-timestamp and fallback paths.

Issue number of the reported bug or feature request: #

Describe your changes
A clear and concise description of the changes you have made.

Testing performed
Describe the testing you have performed to ensure that the bug has been addressed, or that the new feature works as planned.

Additional context
Add any other context about your contribution here.

@pablogsal pablogsal force-pushed the speedscope-timestamps-fallback branch from 2118cc9 to 1daa4a2 Compare April 5, 2026 16:11
@pablogsal
Copy link
Copy Markdown
Collaborator Author

pablogsal commented Apr 5, 2026

Record monotonic timestamps for allocation and deallocation events in
all-allocation capture files, surface the data through the reader
metadata and Python API, and add a CLI flag to enable the feature.

The CLI rejects --allocation-timestamps with aggregated captures since
that format does not preserve per-event ordering.

Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>
Teach the transform reporter to emit sampled speedscope profiles for
bytes and allocation counts, preserving root-to-leaf stack order and
aggregating identical stacks.

When captures include exact allocation timestamps, order the sampled
stacks by the earliest contributing event so the exported profile tracks
the original execution order.

Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>
Keep the normal transform flow for existing formats, but route
speedscope exports through temporal allocation records when a capture
lacks exact per-allocation timestamps.

The reporter uses snapshot times to order surviving intervals for leak
reports and the intervals contributing to the peak snapshot for
high-water-mark reports.

Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>
@pablogsal pablogsal force-pushed the speedscope-timestamps-fallback branch from 1daa4a2 to 31e477b Compare April 5, 2026 19:10
Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>
Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.44262% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.51%. Comparing base (46ddac6) to head (4087875).

Files with missing lines Patch % Lines
src/memray/commands/transform.py 78.57% 9 Missing ⚠️
src/memray/reporters/transform.py 93.65% 8 Missing ⚠️
src/memray/_memray/record_writer.cpp 78.26% 5 Missing ⚠️
src/memray/_memray/tracking_api.cpp 85.71% 2 Missing ⚠️
tests/utils.py 93.54% 2 Missing ⚠️
src/memray/_memray/record_reader.cpp 94.11% 1 Missing ⚠️
tests/integration/test_main.py 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #899      +/-   ##
==========================================
+ Coverage   92.46%   92.51%   +0.04%     
==========================================
  Files          99       99              
  Lines       11726    12110     +384     
  Branches      426      461      +35     
==========================================
+ Hits        10843    11203     +360     
- Misses        883      907      +24     
Flag Coverage Δ
cpp 92.51% <93.44%> (+0.04%) ⬆️
python_and_cython 92.51% <93.44%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@godlygeek godlygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still looking through this one. I've reviewed the changes to add the allocation timestamp to every allocation record, I haven't yet really reviewed the new report. I've got a few comments on the code from the first commit.

But mostly, I'm a bit confused by why we need the per-allocation timestamp. What is it that this gives us beyond what we can already get from our temporal format? Is it just to get finer granularity rather than sticking us with the default 10ms granularity?

Comment on lines +439 to +444
if (d_header.has_allocation_timestamps) {
if (!writeVarint(record.timestamp_us - d_last.allocation_timestamp_us)) {
return false;
}
d_last.allocation_timestamp_us = record.timestamp_us;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use writeVarint directly instead of writeIntegralDelta? Is it just the mismatch between signed and unsigned? If so, I think we should just use a signed timestamp instead. Sure, we're using a monotonic clock and so the delta will never be negative, but that just cuts our range in half.

Looks like the range is still almost 300 millenia even if we cut it in half:

>>> 2**63 / 1000000 / 60 / 60 / 24 / 365
292471.208677536

so we probably don't need to care about discarding that last bit, and it makes our lives easier to be able to reuse the existing delta helpers.

Comment on lines +428 to +438
if (!writeSimpleType(token)
|| (pointer_cache_index == -1 && !writeIntegralDelta(&d_last.data_pointer, record.address >> 3))
|| (allocator_id >= 8 && !writeSimpleType(record.allocator))
|| (d_header.native_traces
&& hooks::allocatorKind(record.allocator) != hooks::AllocatorKind::SIMPLE_DEALLOCATOR
&& !writeIntegralDelta(&d_last.native_frame_id, record.native_frame_id))
|| (hooks::allocatorKind(record.allocator) != hooks::AllocatorKind::SIMPLE_DEALLOCATOR
&& !writeVarint(record.size)))
{
return false;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the diff smaller, I'd prefer:

Suggested change
if (!writeSimpleType(token)
|| (pointer_cache_index == -1 && !writeIntegralDelta(&d_last.data_pointer, record.address >> 3))
|| (allocator_id >= 8 && !writeSimpleType(record.allocator))
|| (d_header.native_traces
&& hooks::allocatorKind(record.allocator) != hooks::AllocatorKind::SIMPLE_DEALLOCATOR
&& !writeIntegralDelta(&d_last.native_frame_id, record.native_frame_id))
|| (hooks::allocatorKind(record.allocator) != hooks::AllocatorKind::SIMPLE_DEALLOCATOR
&& !writeVarint(record.size)))
{
return false;
}
bool success = writeSimpleType(token)
&& (pointer_cache_index != -1
|| writeIntegralDelta(&d_last.data_pointer, record.address >> 3))
&& (allocator_id < 8 || writeSimpleType(record.allocator))
&& (!d_header.native_traces
|| hooks::allocatorKind(record.allocator) == hooks::AllocatorKind::SIMPLE_DEALLOCATOR
|| writeIntegralDelta(&d_last.native_frame_id, record.native_frame_id))
&& (hooks::allocatorKind(record.allocator) == hooks::AllocatorKind::SIMPLE_DEALLOCATOR
|| writeVarint(record.size));
if (!success) {
return false;
}

assert reader.metadata.has_allocation_timestamps is True

assert len(records) == 2
assert records[0].timestamp_us > 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why greater than zero? This is a timestamp since the tracking session started, so it could technically be zero since there's no sleeps, right? We might want to put in a time.sleep(0.01) before both the valloc and the free so that both of the assertions can be > instead of >=

const bool d_allocation_timestamps;
linker::SymbolPatcher d_patcher;
std::unique_ptr<BackgroundThread> d_background_thread;
const std::chrono::steady_clock::time_point d_monotonic_start{std::chrono::steady_clock::now()};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way that we can synchronize this to refer to the exact same moment in time as the start_time in the metadata? I think the answer is "no", but if we could it would be very nice. As it is, we can sort of say what wall clock time an event happened at, but not exactly - even setting aside clock adjustments due to NTP or whatever, or DST transitions, we'll always be off by some fixed number of microseconds due to the delta between when we get the wall clock timestamp and when we get the monotonic clock timestamp.

Like I said, I don't think there's anything we can do about that, but if there was it'd be cool...

bool track_object_lifetimes,
bool has_allocation_timestamps)
{
has_allocation_timestamps = file_format == FileFormat::ALL_ALLOCATIONS && has_allocation_timestamps;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for this, it's simpler to just do...

Suggested change
has_allocation_timestamps = file_format == FileFormat::ALL_ALLOCATIONS && has_allocation_timestamps;

trace_python_allocators,
track_object_lifetimes);
track_object_lifetimes,
has_allocation_timestamps);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and...

Suggested change
has_allocation_timestamps);
false);

But actually, since we're hardcoding it to false for AggregatingRecordWriter, why have it take this parameter at all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants