This document details the metrics, dashboards, and alerting setup for the Django WebSocket service.
The service exposes operational metrics at the /metrics endpoint:
- active_connections (Gauge) – current number of live WebSocket sessions.
- total_messages (Counter) – cumulative messages received.
- error_count (Counter) – exceptions encountered in the WebSocket consumer.
These metrics are instrumented using the official Prometheus Python client and
automatically scraped by Prometheus (configured in docker/compose.yml).
Dashboards are preconfigured and loaded under /etc/grafana/dashboards via
Docker Compose. They include:
- WebSocket Overview: Trends of connections, messages, and errors over time.
- Latency Heatmap: Distribution of processing latency per message.
- Resource Utilization: Host CPU, memory, and network usage.
Each dashboard consists of multiple panels that query Prometheus for:
# Number of open connections
active_connections
# Message rate per second
rate(total_messages[1m])
# Error rate per second
rate(error_count[1m])
Alerts are defined in Prometheus using simple rules:
- It'll raise when active connections drop down to 0 for 60s.
-
Modify or add JSON files under
docker/grafana/dashboards/. -
Restart the Grafana container:
docker-compose restart grafana
-
Verify changes at
http://localhost:3000.

