Skip to content

Introduce event size metrics to event ingester#4817

Merged
JamesMurkin merged 10 commits intomasterfrom
nikola-jokic/redis-bytes-metrics
Apr 16, 2026
Merged

Introduce event size metrics to event ingester#4817
JamesMurkin merged 10 commits intomasterfrom
nikola-jokic/redis-bytes-metrics

Conversation

@nikola-jokic
Copy link
Copy Markdown
Contributor

@nikola-jokic nikola-jokic commented Apr 6, 2026

Add metrics to event ingester to show number of bytes each event type is taking up

This allows us to get an approximate view of what is using all the space in redis

@nikola-jokic nikola-jokic changed the title Nikola jokic/redis bytes metrics Introduce redis metrics analyzing event sizes Apr 6, 2026
Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com>
Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com>
@nikola-jokic nikola-jokic force-pushed the nikola-jokic/redis-bytes-metrics branch from 3a91f34 to b1131f0 Compare April 6, 2026 14:17
@JamesMurkin JamesMurkin marked this pull request as ready for review April 16, 2026 13:10
@JamesMurkin JamesMurkin changed the title Introduce redis metrics analyzing event sizes Introduce event size metrics to event ingester Apr 16, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 16, 2026

Greptile Summary

This PR adds two new Prometheus CounterVec metrics (events_uncompressed_bytes_total and events_estimated_compressed_bytes_total) to the event ingester, gated behind a EventSizeMetricsEnabled config flag (defaulting to true). The estimated compressed size is derived by applying a batch-level compression ratio to each event's individual proto.Size, which is an acknowledged approximation.

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 suggestions.

No new P0/P1 issues found. The ratio-approximation and test-getter concerns were flagged in prior review rounds. The one new finding (typos in GetEventName() metric label values) is P2 and pre-exists this PR. The implementation is well-tested and the feature flag allows safe rollout.

pkg/armadaevents/events_util.go has pre-existing typos in GetEventName() that will appear as Prometheus label values once this PR ships.

Important Files Changed

Filename Overview
internal/eventingester/convert/conversions.go Gated behind eventSizeMetricsEnabled, records per-event uncompressed proto size and an estimated compressed size via batch-level compression ratio. Uses GetEventName() whose return values contain pre-existing typos now hardcoded into metric labels.
internal/common/ingest/metrics/metrics.go Adds two new CounterVec metrics (uncompressed and estimated-compressed event bytes) with queue/event_type labels, plus recording methods. Exposes raw *prometheus.CounterVec getters for test access (previously flagged).
internal/eventingester/convert/conversions_test.go Three new test cases cover happy-path recording, compression-failure suppression, and the disabled-flag path. Uses per-test prometheus.NewRegistry() instances to avoid conflicts.
internal/eventingester/ingester.go Passes config.Metrics.EventSizeMetricsEnabled through to the converter constructor — minimal, correct change.
internal/eventingester/configuration/types.go Adds EventSizeMetricsEnabled bool to MetricsConfig — straightforward addition.
config/eventingester/config.yaml Enables eventSizeMetricsEnabled: true by default in the shipped config.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Pulsar Message] --> B[EventConverter.Convert]
    B --> C[CompactEventSequences]
    C --> D[LimitSequencesByteSize]
    D --> E[clearCancellationReason]
    E --> F[proto.Marshal → bytes]
    F --> G{Marshal OK?}
    G -- No --> H[RecordPulsarMessageError\ncontinue]
    G -- Yes --> I[Compressor.Compress → compressedBytes]
    I --> J{Compress OK?}
    J -- No --> H
    J -- Yes --> K{eventSizeMetricsEnabled?}
    K -- No --> N[Append model.Event]
    K -- Yes --> L["ratio = len(compressedBytes) / len(bytes)"]
    L --> M["For each EventSequence_Event e:\nuncompressed = proto.Size(e)\nestimatedCompressed = Round(uncompressed × ratio)\nRecordEventUncompressedBytes / RecordEventEstimatedCompressedBytes"]
    M --> N
    N --> O[BatchUpdate]
Loading

Reviews (3): Last reviewed commit: "Merge branch 'master' into nikola-jokic/..." | Re-trigger Greptile

Comment thread internal/eventingester/convert/conversions_test.go Outdated
Comment on lines +145 to +151
func (m *Metrics) GetUncompressedEventBytesTotal() *prometheus.CounterVec {
return m.uncompressedEventBytesTotal
}

func (m *Metrics) GetEstimatedCompressedEventBytesTotal() *prometheus.CounterVec {
return m.estimatedCompressedEventBytesTotal
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Test-only getters on production type

GetUncompressedEventBytesTotal and GetEstimatedCompressedEventBytesTotal exist only so tests can call WithLabelValues and read values via testutil.ToFloat64. Exposing raw *prometheus.CounterVec fields from a production struct leaks internal implementation details. A cleaner alternative is to gather counters directly from the testRegistry passed to NewMetricsWithRegistry, using testutil.CollectAndCompare or testRegistry.Gather(), without needing any getters on Metrics.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +74 to +85
// Record per-event byte counters
ratio := 1.0
if len(bytes) > 0 {
ratio = float64(len(compressedBytes)) / float64(len(bytes))
}
for _, e := range es.Events {
eventType := e.GetEventName()
uncompressed := proto.Size(e)
estimatedCompressed := int(math.Round(float64(uncompressed) * ratio))
ec.metrics.RecordEventUncompressedBytes(queue, eventType, uncompressed)
ec.metrics.RecordEventEstimatedCompressedBytes(queue, eventType, estimatedCompressed)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Compression ratio applied to per-event proto sizes that don't sum to the batch wire size

proto.Size(e) returns the serialized size of each EventSequence_Event message in isolation. The marshaled EventSequence (bytes) also carries outer field tags, a repeated-field length prefix per event, and any outer fields on EventSequence itself — so ∑ proto.Size(e) is strictly less than len(bytes). The ratio is therefore computed from a different denominator than what the individual event sizes are scaled by, causing the per-event estimated compressed values to sum to noticeably less than len(compressedBytes). For a monitoring estimate this may be acceptable, but consider using the actual per-event fraction of the batch size (float64(proto.Size(e)) / total_events_size * len(compressedBytes)) for a more internally-consistent estimate.

@JamesMurkin JamesMurkin enabled auto-merge (squash) April 16, 2026 13:57
@JamesMurkin JamesMurkin merged commit 8abf222 into master Apr 16, 2026
27 of 29 checks passed
@JamesMurkin JamesMurkin deleted the nikola-jokic/redis-bytes-metrics branch April 16, 2026 16:30
dslear pushed a commit that referenced this pull request Apr 16, 2026
Add metrics to event ingester to show number of bytes each event type is
taking up

This allows us to get an approximate view of what is using all the space
in redis

---------

Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Co-authored-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: David Slear <david_slear@yahoo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants