From 4d2aadfd9148f56c4583b50d8b0348b7000f57d8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Hauke=20L=C3=B6ffler?= Date: Tue, 13 Jan 2026 11:07:16 +0100 Subject: [PATCH 1/3] docs: add dashboards documentation page with BBE Probes explanation --- .../monitoring/dashboards.md | 26 +++++++++++++++++++ .../monitoring/sidebar.json | 4 +++ 2 files changed, 30 insertions(+) create mode 100644 src/multiplayer-servers/monitoring/dashboards.md diff --git a/src/multiplayer-servers/monitoring/dashboards.md b/src/multiplayer-servers/monitoring/dashboards.md new file mode 100644 index 00000000..9bf2e13f --- /dev/null +++ b/src/multiplayer-servers/monitoring/dashboards.md @@ -0,0 +1,26 @@ +# Dashboards + +GameFabric provides predefined Grafana dashboards for monitoring your infrastructure. +You can find these under "Dashboards" in your Grafana instance. + +## BBE Probes from Nodes + +This dashboard shows BlackBox Exporter (BBE) probe results from each of your assigned nodes to predefined targets, including major cloud providers (AWS, Azure, GCP) and DNS servers (such as 1.1.1.1 and 8.8.8.8). + +### Purpose + +Use this dashboard to quickly identify whether game server issues are caused by network connectivity problems to a particular cloud provider rather than bugs in your application code. + +### Interpreting the Dashboard + +- **Red sections** indicate the timespan during which a probe failed. +- **Short probe failures** are usually nothing to worry about. +- **Prolonged failures** to a single target (for example, a cloud provider your game doesn't use, or a backup DNS server) may have no impact on your game servers. +- If probe failures to **multiple targets persist**, GameFabric automatically sets the status to degraded on [status.gamefabric.com](https://status.gamefabric.com). + +### Best Practices + +Nodes can occasionally experience network issues—100% reliability is not guaranteed. Game developers should implement their servers to be tolerant of network issues by: + +- Retrying failed connections +- Gracefully terminating the game server after multiple connection attempts fail diff --git a/src/multiplayer-servers/monitoring/sidebar.json b/src/multiplayer-servers/monitoring/sidebar.json index abe83c9f..fcc7f187 100644 --- a/src/multiplayer-servers/monitoring/sidebar.json +++ b/src/multiplayer-servers/monitoring/sidebar.json @@ -7,6 +7,10 @@ "text": "Introduction", "link": "/monitoring/introduction" }, + { + "text": "Dashboards", + "link": "/monitoring/dashboards" + }, { "text": "Audit Logs", "link": "/monitoring/auditlogs" From e042f4e1167262a9b6c2e991c93abbaaf8bf1838 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Hauke=20L=C3=B6ffler?= Date: Tue, 13 Jan 2026 11:40:08 +0100 Subject: [PATCH 2/3] docs: address review feedback on dashboards page - Add warning callout explaining probe results are not causally consistent with network issues - Clarify that probes only test specific routes and cloud services, not entire platforms - Restructure Best Practices section for clarity - Improve list introduction wording per style guidelines --- src/multiplayer-servers/monitoring/dashboards.md | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/src/multiplayer-servers/monitoring/dashboards.md b/src/multiplayer-servers/monitoring/dashboards.md index 9bf2e13f..8c73d321 100644 --- a/src/multiplayer-servers/monitoring/dashboards.md +++ b/src/multiplayer-servers/monitoring/dashboards.md @@ -18,9 +18,18 @@ Use this dashboard to quickly identify whether game server issues are caused by - **Prolonged failures** to a single target (for example, a cloud provider your game doesn't use, or a backup DNS server) may have no impact on your game servers. - If probe failures to **multiple targets persist**, GameFabric automatically sets the status to degraded on [status.gamefabric.com](https://status.gamefabric.com). +:::warning Probe results are not causally consistent with network issues +Failing probes do not necessarily indicate network issues, and network issues may occur even when all probes succeed. Probes only test specific routes from nodes to predefined targets. + +The dashboard provides a limited view: + +- Only one public, global endpoint is probed per cloud provider. Regional routes may behave differently. +- Probes target specific cloud services (for example, AWS S3), not the entire cloud platform. Other services on the same provider may be unaffected. +::: + ### Best Practices -Nodes can occasionally experience network issues—100% reliability is not guaranteed. Game developers should implement their servers to be tolerant of network issues by: +Full network reliability is not guaranteed. Nodes can occasionally experience network issues. To handle these issues, implement the following in your game servers: -- Retrying failed connections -- Gracefully terminating the game server after multiple connection attempts fail +- Retry failed connections. +- Gracefully terminate the game server after multiple connection attempts fail. From 5eaff102332a893d0fed778ed454d5dc9e8a3c7a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Hauke=20L=C3=B6ffler?= Date: Wed, 14 Jan 2026 10:00:12 +0100 Subject: [PATCH 3/3] docs: remove Best Practices section from dashboards page Move network reliability guidance to production-workloads/requirements.md where it fits better contextually (PR #103). Addresses review comment from Ullaakut. --- src/multiplayer-servers/monitoring/dashboards.md | 7 ------- 1 file changed, 7 deletions(-) diff --git a/src/multiplayer-servers/monitoring/dashboards.md b/src/multiplayer-servers/monitoring/dashboards.md index 8c73d321..fdd5c15a 100644 --- a/src/multiplayer-servers/monitoring/dashboards.md +++ b/src/multiplayer-servers/monitoring/dashboards.md @@ -26,10 +26,3 @@ The dashboard provides a limited view: - Only one public, global endpoint is probed per cloud provider. Regional routes may behave differently. - Probes target specific cloud services (for example, AWS S3), not the entire cloud platform. Other services on the same provider may be unaffected. ::: - -### Best Practices - -Full network reliability is not guaranteed. Nodes can occasionally experience network issues. To handle these issues, implement the following in your game servers: - -- Retry failed connections. -- Gracefully terminate the game server after multiple connection attempts fail.