Welcome to the Skotel California.
SGOtel ("skotel") is a small HTTP service that receives SendGrid Event Webhook POSTs, verifies their ECDSA signatures, and republishes each event as OpenTelemetry logs (one record per event, full fidelity) and metrics (low-cardinality counters and histograms for dashboards).
SendGrid webhook events are discrete records that arrive asynchronously and sometimes hours apart. That maps cleanly onto OTel logs: one record per event, all fields preserved as attributes. It does not map cleanly onto traces, because there is no well-defined "end" to an email's lifecycle and events routinely arrive out of order.
Logs preserve every field for forensic queries ("why did this specific email
bounce?"). Metrics are derived in parallel for dashboards and alerts, but with
cardinality kept bounded (no email or sg_message_id in metric labels).
SendGrid → POST /webhook → [verify ECDSA sig + timestamp window]
↓
[parse JSON array]
↓
[bounded channel] ── 200 OK back to SendGrid
↓
[publisher workers]
↓
┌─────────────────┴─────────────────┐
↓ ↓
OTel Logs (per event) OTel Metrics (counters)
└─────────────────┬─────────────────┘
↓
OTLP exporter (http/grpc)
The handler does verification and parsing synchronously (failures must surface to SendGrid as non-2xx) and then enqueues events to a bounded channel before returning 200. The publisher's worker goroutines drain the channel.
| Field | Source |
|---|---|
Timestamp |
SendGrid timestamp (Unix seconds) |
ObservedTimestamp |
Receive time at SGOtel |
Severity |
bounce/dropped/spam_report → ERROR; deferred → WARN; everything else → INFO |
EventName |
sendgrid.<event> |
Body |
"<event> <email>" (email subject to redaction) |
sendgrid.event |
event type |
sendgrid.event_id |
sg_event_id |
sendgrid.message_id |
sg_message_id |
sendgrid.smtp_id |
smtp-id |
sendgrid.email |
recipient (see SGOTEL_REDACT_EMAIL) |
sendgrid.category |
category array |
sendgrid.bounce.{reason,status,type} |
bounce-only |
sendgrid.url |
click-only |
sendgrid.useragent, sendgrid.ip |
open/click |
sendgrid.response, sendgrid.attempt |
delivery/deferred |
sendgrid.custom.<key> |
any custom args attached at send time |
| Metric | Type | Attributes |
|---|---|---|
sendgrid.events.total |
counter | event, category (first category only) |
sendgrid.bounces.total |
counter | type (hard/soft/blocked), status_class (2xx/4xx/5xx) |
sendgrid.webhook.batch.size |
histogram | (none) |
sendgrid.webhook.requests.total |
counter | result (ok / bad_signature / bad_payload / queue_full / …) |
Cross-event latency (e.g., processed → delivered) is intentionally out of
scope; it requires state and is fragile under out-of-order delivery. Derive it
downstream with an OTel collector connector if you need it.
All knobs are environment variables. Standard OTel env vars
(OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, etc.) are honored
by the underlying exporters.
| Variable | Default | Notes |
|---|---|---|
SGOTEL_LISTEN_ADDR |
:8080 |
Listen address. |
SGOTEL_WEBHOOK_PATH |
/webhook |
Path for SendGrid POSTs. |
SGOTEL_SENDGRID_PUBLIC_KEY |
(required) | Base64 PKIX DER. Copy from SendGrid → Mail Settings → Signed Webhook. |
SGOTEL_SIGNATURE_MAX_AGE |
5m |
Reject signatures whose timestamp falls outside ± this window. Use 0 to disable. |
SGOTEL_REDACT_EMAIL |
none |
One of none, hash (SHA-256 hex of lowercased address), drop. |
SGOTEL_QUEUE_SIZE |
1024 |
Buffered channel between handler and publishers. |
SGOTEL_QUEUE_FULL_BEHAVIOR |
block |
block waits for room (and may delay the SendGrid 200); shed responds 503. |
OTEL_SERVICE_NAME |
sgotel |
Standard OTel service name (identifies the relay process, not the upstream). All signals additionally carry the resource attribute messaging.system=sendgrid so backends can facet on it. |
OTEL_EXPORTER_OTLP_PROTOCOL |
http/protobuf |
Or grpc. Per-signal overrides (..._LOGS_PROTOCOL, ..._METRICS_PROTOCOL) are honored. |
OTEL_EXPORTER_OTLP_ENDPOINT |
(SDK default) | Collector endpoint. |
In the happy path SendGrid only re-POSTs an event when it receives a non-2xx response. SGOtel returns 200 as soon as the event is enqueued, so SendGrid does not retry. The remaining theoretical duplication source (a captured payload replayed by a third party) is closed by the timestamp-window check on the signature, which is stateless and cheaper than any in-process dedup table.
Duplication caused by SGOtel's own OTLP exporter retries is an OTLP-layer
concern handled at the collector or backend, not by sg_event_id.
export SGOTEL_SENDGRID_PUBLIC_KEY="<base64 PKIX from SendGrid UI>"
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
go run ./cmd/sgotelHealth check is at GET /healthz.
go vet ./...
go test -race ./...Tests are unit + handler-integration only; no live SendGrid, no live OTel collector required. The handler test uses an in-memory sink so it exercises verification, parsing, queueing, and the request/result metrics path end-to-end.