Skip to content

Fix TemporalBuffer overflow for long-running timeseries#799

Merged
SimonHeybrock merged 4 commits intomainfrom
fix-temporal-buffer-size-limits
Mar 19, 2026
Merged

Fix TemporalBuffer overflow for long-running timeseries#799
SimonHeybrock merged 4 commits intomainfrom
fix-temporal-buffer-size-limits

Conversation

@SimonHeybrock
Copy link
Copy Markdown
Member

Summary

  • Production timeseries plots (Total Counts etc.) stop updating after ~2-3 hours because TemporalBuffer hits its hardcoded 10,000-element capacity. With FullHistoryExtractor (required_timespan=inf), timespan-based trimming drops nothing, causing a permanent ValueError on every subsequent update.
  • Replace flat element-count limit with a 20 MB memory budget per buffer, computed from actual bytes per element (including coordinate buffers). This gives ~7 days of 0-D scalar data at 1/sec, scaling down naturally for larger data types.
  • When timespan trimming can't free enough space, fall back to ring-buffer behavior (drop oldest) instead of raising an error.

Fixes #798.

Test plan

  • Existing temporal buffer tests pass (1194 dashboard tests)
  • New test for ring-buffer behavior with required_timespan=inf
  • Verify in production that timeseries plots survive beyond 3 hours

🤖 Generated with Claude Code

…city

TemporalBuffer had a hardcoded 10,000-element capacity limit. When
FullHistoryExtractor (required_timespan=inf) filled the buffer, trimming
dropped nothing (cutoff = -infinity keeps everything), causing a permanent
ValueError on every subsequent update. This made timeseries plots stop
updating after ~2-3 hours in production. Fixes #798.

Two changes:
- When timespan-based trimming doesn't free enough space, fall back to
  ring-buffer behavior: drop the minimum number of oldest elements to
  make room for new data.
- Replace the flat 10,000-element default with a 20 MB memory budget.
  Capacity is computed from actual bytes per element (including coordinate
  buffers), giving ~7 days of 0-D scalar data at 1/sec and proportionally
  less for larger data types.
Avoid O(capacity) copy on every update when buffer is full:
- _make_room drops at least 10% of capacity instead of 1 element,
  so the expensive VariableBuffer.drop() runs once per ~10% fills.
- _trim_to_timespan short-circuits for inf timespan, skipping the
  pointless cutoff computation that can never drop anything.
@SimonHeybrock SimonHeybrock marked this pull request as ready for review March 16, 2026 11:20
@SimonHeybrock SimonHeybrock enabled auto-merge March 16, 2026 11:20
@github-project-automation github-project-automation Bot moved this to In progress in Development Board Mar 16, 2026
@SimonHeybrock SimonHeybrock moved this from In progress to Selected in Development Board Mar 16, 2026
@nvaytet nvaytet self-assigned this Mar 18, 2026
# Calculate max_capacity from memory limit
# Calculate max_capacity from memory limit, accounting for all buffers
# (data + time coord + optional start_time/end_time coords)
if 'time' in data.dims:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are dividing by n-time everywhere, I think you can reduce duplication by doing something like

bytes_per_element = data.data.values.nbytes
bytes_per_element += data.coords['time'].values.nbytes
if 'start_time' in data.coords:
    bytes_per_element += data.coords['start_time'].values.nbytes
if 'end_time' in data.coords:
    bytes_per_element += data.coords['end_time'].values.nbytes

if 'time' in data.dims:
    bytes_per_element /= n_time

?

bytes_per_element = data.data.values.nbytes / data.sizes['time']
n_time = data.sizes['time']
bytes_per_element = data.data.values.nbytes / n_time
bytes_per_element += data.coords['time'].values.nbytes / n_time
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the time coord always guaranteed to be there?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, otherwise the value cannot be extracted as a timeseries.

for t in range(1, capacity):
buffer.add(make_single_slice([float(t)] * 2, float(t)))
assert buffer._data_buffer.size == capacity

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert result.coords['time'].values[0] == 0.0 before adding one more, to make sure that this was the thing that was responsible for dropping the old?

- Refactor bytes-per-element calculation to eliminate duplication by
  accumulating total bytes first, then dividing by n_time at the end
- Add comment explaining that 'time' coord is always guaranteed to be present
- Add assertion in ring-buffer overflow test to verify buffer is full
  before triggering the overflow
@SimonHeybrock SimonHeybrock merged commit acd03bd into main Mar 19, 2026
4 checks passed
@SimonHeybrock SimonHeybrock deleted the fix-temporal-buffer-size-limits branch March 19, 2026 08:03
@github-project-automation github-project-automation Bot moved this from Selected to Done in Development Board Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Errors in production dashboard about data exceeding buffer

2 participants