Add speedscope export timestamp fallback by pablogsal · Pull Request #899 · bloomberg/memray

pablogsal · 2026-04-05T15:40:48Z

Build the speedscope export around sampled stacks and make the ordering
source explicit instead of relying on the reader's iteration order.

When a capture has exact allocation timestamps, record them in the
non-aggregated file format, expose them through the reader metadata, and
sort each aggregated speedscope sample by the earliest allocation time
that contributed to it. This keeps the sampled profile aligned with the
real execution order without changing the existing high-water-mark and
leak aggregation logic.

When exact timestamps are not available, switch the speedscope path to
temporal allocation records and recover a stable approximation from the
memory snapshots we already store. Leak exports keep only intervals that
survive to the end of the run and order them by the first snapshot where
they appear. High-water-mark exports evaluate intervals at the peak
snapshot and order them by the first snapshot that contributes to that
peak.

Keep this behavior speedscope-specific and reject --aggregate with
--allocation-timestamps, because aggregated captures discard the event
ordering that the export depends on. The patch also bumps the file
format/header metadata and adds command, reader/writer, and reporter
tests for both the exact-timestamp and fallback paths.

Issue number of the reported bug or feature request: #

Describe your changes
A clear and concise description of the changes you have made.

Testing performed
Describe the testing you have performed to ensure that the bug has been addressed, or that the new feature works as planned.

Additional context
Add any other context about your contribution here.

pablogsal · 2026-04-05T19:08:22Z

See an example scope for the sqlite example by using › https://github.com/user-attachments/files/26490542/memray-speedscope-example.py.201148.speedscope.json in https://www.speedscope.app

Record monotonic timestamps for allocation and deallocation events in all-allocation capture files, surface the data through the reader metadata and Python API, and add a CLI flag to enable the feature. The CLI rejects --allocation-timestamps with aggregated captures since that format does not preserve per-event ordering. Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>

Teach the transform reporter to emit sampled speedscope profiles for bytes and allocation counts, preserving root-to-leaf stack order and aggregating identical stacks. When captures include exact allocation timestamps, order the sampled stacks by the earliest contributing event so the exported profile tracks the original execution order. Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>

Keep the normal transform flow for existing formats, but route speedscope exports through temporal allocation records when a capture lacks exact per-allocation timestamps. The reporter uses snapshot times to order surviving intervals for leak reports and the intervals contributing to the peak snapshot for high-water-mark reports. Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>

Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>

codecov-commenter · 2026-04-05T19:36:03Z

Codecov Report

❌ Patch coverage is 93.44262% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.51%. Comparing base (46ddac6) to head (4087875).

Files with missing lines	Patch %	Lines
src/memray/commands/transform.py	78.57%	9 Missing ⚠️
src/memray/reporters/transform.py	93.65%	8 Missing ⚠️
src/memray/_memray/record_writer.cpp	78.26%	5 Missing ⚠️
src/memray/_memray/tracking_api.cpp	85.71%	2 Missing ⚠️
tests/utils.py	93.54%	2 Missing ⚠️
src/memray/_memray/record_reader.cpp	94.11%	1 Missing ⚠️
tests/integration/test_main.py	91.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #899      +/-   ##
==========================================
+ Coverage   92.46%   92.51%   +0.04%     
==========================================
  Files          99       99              
  Lines       11726    12110     +384     
  Branches      426      461      +35     
==========================================
+ Hits        10843    11203     +360     
- Misses        883      907      +24

Flag	Coverage Δ
cpp	`92.51% <93.44%> (+0.04%)`	⬆️
python_and_cython	`92.51% <93.44%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

godlygeek

I'm still looking through this one. I've reviewed the changes to add the allocation timestamp to every allocation record, I haven't yet really reviewed the new report. I've got a few comments on the code from the first commit.

But mostly, I'm a bit confused by why we need the per-allocation timestamp. What is it that this gives us beyond what we can already get from our temporal format? Is it just to get finer granularity rather than sticking us with the default 10ms granularity?

godlygeek · 2026-04-29T21:37:48Z

+    if (d_header.has_allocation_timestamps) {
+        if (!writeVarint(record.timestamp_us - d_last.allocation_timestamp_us)) {
+            return false;
+        }
+        d_last.allocation_timestamp_us = record.timestamp_us;
+    }


Why use writeVarint directly instead of writeIntegralDelta? Is it just the mismatch between signed and unsigned? If so, I think we should just use a signed timestamp instead. Sure, we're using a monotonic clock and so the delta will never be negative, but that just cuts our range in half.

Looks like the range is still almost 300 millenia even if we cut it in half:

>>> 2**63 / 1000000 / 60 / 60 / 24 / 365 292471.208677536

so we probably don't need to care about discarding that last bit, and it makes our lives easier to be able to reuse the existing delta helpers.

godlygeek · 2026-04-29T21:39:23Z

+    if (!writeSimpleType(token)
+        || (pointer_cache_index == -1 && !writeIntegralDelta(&d_last.data_pointer, record.address >> 3))
+        || (allocator_id >= 8 && !writeSimpleType(record.allocator))
+        || (d_header.native_traces
+            && hooks::allocatorKind(record.allocator) != hooks::AllocatorKind::SIMPLE_DEALLOCATOR
+            && !writeIntegralDelta(&d_last.native_frame_id, record.native_frame_id))
+        || (hooks::allocatorKind(record.allocator) != hooks::AllocatorKind::SIMPLE_DEALLOCATOR
+            && !writeVarint(record.size)))
+    {
+        return false;
+    }


To keep the diff smaller, I'd prefer:

Suggested change

if (!writeSimpleType(token)

|| (pointer_cache_index == -1 && !writeIntegralDelta(&d_last.data_pointer, record.address >> 3))

|| (allocator_id >= 8 && !writeSimpleType(record.allocator))

|| (d_header.native_traces

&& hooks::allocatorKind(record.allocator) != hooks::AllocatorKind::SIMPLE_DEALLOCATOR

&& !writeIntegralDelta(&d_last.native_frame_id, record.native_frame_id))

|| (hooks::allocatorKind(record.allocator) != hooks::AllocatorKind::SIMPLE_DEALLOCATOR

&& !writeVarint(record.size)))

{

return false;

}

bool success = writeSimpleType(token)

&& (pointer_cache_index != -1

|| writeIntegralDelta(&d_last.data_pointer, record.address >> 3))

&& (allocator_id < 8 || writeSimpleType(record.allocator))

&& (!d_header.native_traces

|| hooks::allocatorKind(record.allocator) == hooks::AllocatorKind::SIMPLE_DEALLOCATOR

|| writeIntegralDelta(&d_last.native_frame_id, record.native_frame_id))

&& (hooks::allocatorKind(record.allocator) == hooks::AllocatorKind::SIMPLE_DEALLOCATOR

|| writeVarint(record.size));

if (!success) {

return false;

}

godlygeek · 2026-04-29T21:59:50Z

+        assert reader.metadata.has_allocation_timestamps is True
+
+    assert len(records) == 2
+    assert records[0].timestamp_us > 0


Why greater than zero? This is a timestamp since the tracking session started, so it could technically be zero since there's no sleeps, right? We might want to put in a time.sleep(0.01) before both the valloc and the free so that both of the assertions can be > instead of >=

godlygeek · 2026-04-29T22:08:03Z

+    const bool d_allocation_timestamps;
    linker::SymbolPatcher d_patcher;
    std::unique_ptr<BackgroundThread> d_background_thread;
+    const std::chrono::steady_clock::time_point d_monotonic_start{std::chrono::steady_clock::now()};


Is there any way that we can synchronize this to refer to the exact same moment in time as the start_time in the metadata? I think the answer is "no", but if we could it would be very nice. As it is, we can sort of say what wall clock time an event happened at, but not exactly - even setting aside clock adjustments due to NTP or whatever, or DST transitions, we'll always be off by some fixed number of microseconds due to the delta between when we get the wall clock timestamp and when we get the monotonic clock timestamp.

Like I said, I don't think there's anything we can do about that, but if there was it'd be cool...

godlygeek · 2026-04-29T22:41:21Z

+        bool track_object_lifetimes,
+        bool has_allocation_timestamps)
 {
+    has_allocation_timestamps = file_format == FileFormat::ALL_ALLOCATIONS && has_allocation_timestamps;


There's no need for this, it's simpler to just do...

Suggested change

has_allocation_timestamps = file_format == FileFormat::ALL_ALLOCATIONS && has_allocation_timestamps;

godlygeek · 2026-04-29T22:41:30Z

                    trace_python_allocators,
-                    track_object_lifetimes);
+                    track_object_lifetimes,
+                    has_allocation_timestamps);


and...

Suggested change

has_allocation_timestamps);

false);

But actually, since we're hardcoding it to false for AggregatingRecordWriter, why have it take this parameter at all?

pablogsal force-pushed the speedscope-timestamps-fallback branch from 2118cc9 to 1daa4a2 Compare April 5, 2026 16:11

pablogsal added 3 commits April 5, 2026 20:09

pablogsal force-pushed the speedscope-timestamps-fallback branch from 1daa4a2 to 31e477b Compare April 5, 2026 19:10

pablogsal added 2 commits April 5, 2026 20:18

Fix mypy checks for speedscope transform

fc46504

Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>

Fix mypy typing for IPython flamegraph magic

4087875

Signed-off-by: Pablo Galindo Salgado <pablogsal@gmail.com>

godlygeek reviewed Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add speedscope export timestamp fallback#899

Add speedscope export timestamp fallback#899
pablogsal wants to merge 5 commits intobloomberg:mainfrom
pablogsal:speedscope-timestamps-fallback

pablogsal commented Apr 5, 2026

Uh oh!

pablogsal commented Apr 5, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Apr 5, 2026

Uh oh!

godlygeek left a comment

Uh oh!

godlygeek Apr 29, 2026

Uh oh!

godlygeek Apr 29, 2026

Uh oh!

godlygeek Apr 29, 2026

Uh oh!

godlygeek Apr 29, 2026

Uh oh!

godlygeek Apr 29, 2026

Uh oh!

godlygeek Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pablogsal commented Apr 5, 2026

Uh oh!

pablogsal commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Apr 5, 2026

Codecov Report

Uh oh!

godlygeek left a comment

Choose a reason for hiding this comment

Uh oh!

godlygeek Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

godlygeek Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

godlygeek Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

godlygeek Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

godlygeek Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

godlygeek Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pablogsal commented Apr 5, 2026 •

edited

Loading