Skip to content

tsdb/agent: Checkpoint based on Series in Memory #17617

@kgeckhart

Description

@kgeckhart

Proposal

The agent currently uses the same Checkpoint implementation as all other parts of prometheus,

// Checkpoint creates a compacted checkpoint of segments in range [from, to] in the given WAL.
// It includes the most recent checkpoint if it exists.
// All series not satisfying keep, samples/tombstones/exemplars below mint and
// metadata that are not the latest are dropped.

The checkpoint serves three purposes for agent mode,

  1. Populates the agent db stripeSeries with known series + last sample timestamps on startup
  2. Populate series caches in queue_manager on startup
  3. Pruning the series caches in queue_manager after a new checkpoint is created
  4. Not applicable for agent mode yet and might be dropped Most recently metadata for a series

This is an incredibly small subset of the data vs what is persisted in a checkpoint which includes, series which exist in the WAL, samples above mint, float and regular histogram samples above mint, exemplars above mint, and latest metadata. In order to create a checkpoint with all these records we re-read the current checkpoint + all segments. This is a lot of overhead given all the data we require for the checkpoint is currently in memory between stripeSeries and the deleted series in agent db.

I propose we introduce another checkpoint implementation which could look something like,

type ActiveSeries interface {
    Ref() chunks.HeadSeriesRef
    Labels() labels.Labels
    LastSampleTimestamp() int64
}

// Checkpoint creates an unindexed checkpoint containing record.RefSeries and 
// record.RefSample for ActiveSeries and  a record.RefSeries for the recentlyDeleted series. 
func Checkpoint(logger *slog.Logger, w *WL, seriesIter iter.Seq[ActiveSeries], recentlyDeleted []chunks.HeadSeriesRef)

that could be driven by the data we currently have in memory which would,

  1. Reduce the overhead of taking a checkpoint
  2. Reduce the overhead of queue_manager reading a checkpoint as checkpoints will be smaller
  3. Improve startup times/resource usage due to smaller checkpoint sizes

I did a quick implementation of this in Grafana Alloy where it shrunk a 214MB checkpoint by 56% down to 137MB, with the following improvements to creating a checkpoint + loading a checkpoint

              │ old-create.txt │           new-create.txt           │
              │     sec/op     │   sec/op     vs base               │
Checkpoint-11     3477.6m ± 7%   913.3m ± 6%  -73.74% (p=0.002 n=6)

              │ old-create.txt │            new-create.txt            │
              │      B/op      │     B/op       vs base               │
Checkpoint-11   2717.25Mi ± 0%   11.52Mi ± 11%  -99.58% (p=0.002 n=6)

              │ old-create.txt  │           new-create.txt           │
              │    allocs/op    │ allocs/op   vs base                │
Checkpoint-11   34087723.5 ± 0%   325.0 ± 1%  -100.00% (p=0.002 n=6)
                │ baseline-load.txt │           new-load.txt            │
                │      sec/op       │   sec/op    vs base               │
LoadLargeWAL-11          4.195 ± 2%   1.105 ± 5%  -73.67% (p=0.002 n=6)

                │ baseline-load.txt │            new-load.txt             │
                │       B/op        │     B/op      vs base               │
LoadLargeWAL-11        2.001Gi ± 1%   1.204Gi ± 0%  -39.83% (p=0.002 n=6)

                │ baseline-load.txt │            new-load.txt            │
                │     allocs/op     │  allocs/op   vs base               │
LoadLargeWAL-11         35.22M ± 0%   30.76M ± 0%  -12.66% (p=0.002 n=6)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions