Skip to content

Tight-Line/sgotel

Repository files navigation

SGOtel

Welcome to the Skotel California.

SGOtel ("skotel") is a small HTTP service that receives SendGrid Event Webhook POSTs, verifies their ECDSA signatures, and republishes each event as OpenTelemetry logs (one record per event, full fidelity) and metrics (low-cardinality counters and histograms for dashboards).

Why logs and metrics, no spans

SendGrid webhook events are discrete records that arrive asynchronously and sometimes hours apart. That maps cleanly onto OTel logs: one record per event, all fields preserved as attributes. It does not map cleanly onto traces, because there is no well-defined "end" to an email's lifecycle and events routinely arrive out of order.

Logs preserve every field for forensic queries ("why did this specific email bounce?"). Metrics are derived in parallel for dashboards and alerts, but with cardinality kept bounded (no email or sg_message_id in metric labels).

Architecture

SendGrid → POST /webhook → [verify ECDSA sig + timestamp window]
                                  ↓
                          [parse JSON array]
                                  ↓
                        [bounded channel] ── 200 OK back to SendGrid
                                  ↓
                       [publisher workers]
                                  ↓
                ┌─────────────────┴─────────────────┐
                ↓                                   ↓
        OTel Logs (per event)             OTel Metrics (counters)
                └─────────────────┬─────────────────┘
                                  ↓
                       OTLP exporter (http/grpc)

The handler does verification and parsing synchronously (failures must surface to SendGrid as non-2xx) and then enqueues events to a bounded channel before returning 200. The publisher's worker goroutines drain the channel.

Event → OTel mapping

Logs

Field Source
Timestamp SendGrid timestamp (Unix seconds)
ObservedTimestamp Receive time at SGOtel
Severity bounce/dropped/spam_report → ERROR; deferred → WARN; everything else → INFO
EventName sendgrid.<event>
Body "<event> <email>" (email subject to redaction)
sendgrid.event event type
sendgrid.event_id sg_event_id
sendgrid.message_id sg_message_id
sendgrid.smtp_id smtp-id
sendgrid.email recipient (see SGOTEL_REDACT_EMAIL)
sendgrid.category category array
sendgrid.bounce.{reason,status,type} bounce-only
sendgrid.url click-only
sendgrid.useragent, sendgrid.ip open/click
sendgrid.response, sendgrid.attempt delivery/deferred
sendgrid.custom.<key> any custom args attached at send time

Metrics

Metric Type Attributes
sendgrid.events.total counter event, category (first category only)
sendgrid.bounces.total counter type (hard/soft/blocked), status_class (2xx/4xx/5xx)
sendgrid.webhook.batch.size histogram (none)
sendgrid.webhook.requests.total counter result (ok / bad_signature / bad_payload / queue_full / …)

Cross-event latency (e.g., processed → delivered) is intentionally out of scope; it requires state and is fragile under out-of-order delivery. Derive it downstream with an OTel collector connector if you need it.

Configuration

All knobs are environment variables. Standard OTel env vars (OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, etc.) are honored by the underlying exporters.

Variable Default Notes
SGOTEL_LISTEN_ADDR :8080 Listen address.
SGOTEL_WEBHOOK_PATH /webhook Path for SendGrid POSTs.
SGOTEL_SENDGRID_PUBLIC_KEY (required) Base64 PKIX DER. Copy from SendGrid → Mail Settings → Signed Webhook.
SGOTEL_SIGNATURE_MAX_AGE 5m Reject signatures whose timestamp falls outside ± this window. Use 0 to disable.
SGOTEL_REDACT_EMAIL none One of none, hash (SHA-256 hex of lowercased address), drop.
SGOTEL_QUEUE_SIZE 1024 Buffered channel between handler and publishers.
SGOTEL_QUEUE_FULL_BEHAVIOR block block waits for room (and may delay the SendGrid 200); shed responds 503.
OTEL_SERVICE_NAME sgotel Standard OTel service name (identifies the relay process, not the upstream). All signals additionally carry the resource attribute messaging.system=sendgrid so backends can facet on it.
OTEL_EXPORTER_OTLP_PROTOCOL http/protobuf Or grpc. Per-signal overrides (..._LOGS_PROTOCOL, ..._METRICS_PROTOCOL) are honored.
OTEL_EXPORTER_OTLP_ENDPOINT (SDK default) Collector endpoint.

Why no sg_event_id dedup

In the happy path SendGrid only re-POSTs an event when it receives a non-2xx response. SGOtel returns 200 as soon as the event is enqueued, so SendGrid does not retry. The remaining theoretical duplication source (a captured payload replayed by a third party) is closed by the timestamp-window check on the signature, which is stateless and cheaper than any in-process dedup table.

Duplication caused by SGOtel's own OTLP exporter retries is an OTLP-layer concern handled at the collector or backend, not by sg_event_id.

Running locally

export SGOTEL_SENDGRID_PUBLIC_KEY="<base64 PKIX from SendGrid UI>"
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
go run ./cmd/sgotel

Health check is at GET /healthz.

Testing

go vet ./...
go test -race ./...

Tests are unit + handler-integration only; no live SendGrid, no live OTel collector required. The handler test uses an in-memory sink so it exercises verification, parsing, queueing, and the request/result metrics path end-to-end.

About

SendGrid-to-OpenTelemetry relay. Turn verified SendGrid web events into actionable logs and metrics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors