Skip to content

tsdb/agent: Prevent unread segments from being truncated #17616

@kgeckhart

Description

@kgeckhart

Proposal

Today the tsdb/agent truncates the the WAL segments based on the assumption,

// The lower two-thirds of segments should contain mostly obsolete samples.
which is not verified before truncation occurs.

Due to this it's really hard to determine how much downtime can be tolerated in a remote write configuration since it's a factor of TruncateFrequency + rate of data in. This leads to a much higherTruncateFrequencythan is really necessary and much larger WALs that must be fully replayed on startup. Internally we run with a 15 minute interval as we are okay with trading off downtime tolerance for less memory usage.

I would propose remote.Storage expose the ability to subscribe to be notified when a segment changes. This would be called after all current queues have read past a segment, sample implementation.

After segment notifications are working the tsdb/agent could subscribe and truncate based on segments that have been fully read. At this point we could consider dropping the default TruncateFrequency allowing for smaller WALs.

I'm not sure how much of this, if any, is applicable for the tsdb proper

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions