Conversation
1919c37 to
af86a09
Compare
|
This is ready for review - the "WIP" in the title was a mistake |
|
Might we be able to have |
|
This made me think quite a bit about notification receipts, duplicate notification receipts and etcetera. I think you've made the right choice, but people using the metrics are going to need to understand that the statuses aren't "exclusive" (i.e. one notification will tally for both Would it be paranoid to have another metric here in parallel using |
Good shout, done. I've left it off
Also a good point, I've updated the descriptions on both
Hmmmmm... Maybe, not sure what we'd call it though. I suppose the name doesn't matter that much as long as the description makes sense. |
I think you're right, looking for "duplicate callback warning" log messages I only saw them from SES and firetext, and the firetext outcomes panel in the "Notify Summary" grafana dashboard has a little tooltip alluding to this. |
risicle
left a comment
There was a problem hiding this comment.
Cool. Up to you on the parallel metric.
Changes the instrumentation of `notification.deliver.duration` so that it does not fall back to the current time if `delivery_dt` is unset, and adds a new metric `notification.callback.duration` based on the timestamp at the moment we received the delivery receipt. This makes the delivery receipt metric more honest while giving us a backup metric if we decide we don't trust the delivery timestamps as reported by a provider.
https://trello.com/c/25A3gpXh/1358-replace-statsd-instrumentation-across-notify-with-otel
Probably best reviewed with whitespace changes hidden.
Adds the following metrics, which should hopefully replace the existing StatsD ones (the dashboards will be updated in a future PR).
The names are automatically normalised to follow the Prometheus naming convention by the OTel collector, so e.g.
provider.request.durationbecomesprovider_request_duration_seconds. I have deliberately chosen names that do not conflict with the existing metrics that go via the StatsD exporter, so we can have both stacks running in parallel.provider.request.duration {error.type, notification.type, provider.name}Duration of (HTTP) requests to providers
notification.sms.internationals {notification.status, notification.sms.country_code}Number of international text messages sent
No screenshot for this one, but the behaviour should be identical to the existing
notify_international_sms_totalmetric reported by StatsD.notification.send.duration {error.type, key.type, notification.type, provider.name}Elapsed time between notification creation and sending to provider
notification.deliver.duration {key.type, notification.status, notification.type, provider.name}Elapsed time between notification creation datetime and delivery datetime reported by provider