Introduce event size metrics to event ingester#4817
Conversation
Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com>
3a91f34 to
b1131f0
Compare
Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Greptile SummaryThis PR adds two new Prometheus Confidence Score: 5/5Safe to merge; all remaining findings are P2 suggestions. No new P0/P1 issues found. The ratio-approximation and test-getter concerns were flagged in prior review rounds. The one new finding (typos in GetEventName() metric label values) is P2 and pre-exists this PR. The implementation is well-tested and the feature flag allows safe rollout. pkg/armadaevents/events_util.go has pre-existing typos in GetEventName() that will appear as Prometheus label values once this PR ships. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Pulsar Message] --> B[EventConverter.Convert]
B --> C[CompactEventSequences]
C --> D[LimitSequencesByteSize]
D --> E[clearCancellationReason]
E --> F[proto.Marshal → bytes]
F --> G{Marshal OK?}
G -- No --> H[RecordPulsarMessageError\ncontinue]
G -- Yes --> I[Compressor.Compress → compressedBytes]
I --> J{Compress OK?}
J -- No --> H
J -- Yes --> K{eventSizeMetricsEnabled?}
K -- No --> N[Append model.Event]
K -- Yes --> L["ratio = len(compressedBytes) / len(bytes)"]
L --> M["For each EventSequence_Event e:\nuncompressed = proto.Size(e)\nestimatedCompressed = Round(uncompressed × ratio)\nRecordEventUncompressedBytes / RecordEventEstimatedCompressedBytes"]
M --> N
N --> O[BatchUpdate]
Reviews (3): Last reviewed commit: "Merge branch 'master' into nikola-jokic/..." | Re-trigger Greptile |
| func (m *Metrics) GetUncompressedEventBytesTotal() *prometheus.CounterVec { | ||
| return m.uncompressedEventBytesTotal | ||
| } | ||
|
|
||
| func (m *Metrics) GetEstimatedCompressedEventBytesTotal() *prometheus.CounterVec { | ||
| return m.estimatedCompressedEventBytesTotal | ||
| } |
There was a problem hiding this comment.
Test-only getters on production type
GetUncompressedEventBytesTotal and GetEstimatedCompressedEventBytesTotal exist only so tests can call WithLabelValues and read values via testutil.ToFloat64. Exposing raw *prometheus.CounterVec fields from a production struct leaks internal implementation details. A cleaner alternative is to gather counters directly from the testRegistry passed to NewMetricsWithRegistry, using testutil.CollectAndCompare or testRegistry.Gather(), without needing any getters on Metrics.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| // Record per-event byte counters | ||
| ratio := 1.0 | ||
| if len(bytes) > 0 { | ||
| ratio = float64(len(compressedBytes)) / float64(len(bytes)) | ||
| } | ||
| for _, e := range es.Events { | ||
| eventType := e.GetEventName() | ||
| uncompressed := proto.Size(e) | ||
| estimatedCompressed := int(math.Round(float64(uncompressed) * ratio)) | ||
| ec.metrics.RecordEventUncompressedBytes(queue, eventType, uncompressed) | ||
| ec.metrics.RecordEventEstimatedCompressedBytes(queue, eventType, estimatedCompressed) | ||
| } |
There was a problem hiding this comment.
Compression ratio applied to per-event proto sizes that don't sum to the batch wire size
proto.Size(e) returns the serialized size of each EventSequence_Event message in isolation. The marshaled EventSequence (bytes) also carries outer field tags, a repeated-field length prefix per event, and any outer fields on EventSequence itself — so ∑ proto.Size(e) is strictly less than len(bytes). The ratio is therefore computed from a different denominator than what the individual event sizes are scaled by, causing the per-event estimated compressed values to sum to noticeably less than len(compressedBytes). For a monitoring estimate this may be acceptable, but consider using the actual per-event fraction of the batch size (float64(proto.Size(e)) / total_events_size * len(compressedBytes)) for a more internally-consistent estimate.
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
…/G-Research/armada into nikola-jokic/redis-bytes-metrics
Add metrics to event ingester to show number of bytes each event type is taking up This allows us to get an approximate view of what is using all the space in redis --------- Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com> Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com> Co-authored-by: JamesMurkin <jamesmurkin@hotmail.com> Signed-off-by: David Slear <david_slear@yahoo.com>
Add metrics to event ingester to show number of bytes each event type is taking up
This allows us to get an approximate view of what is using all the space in redis