Cleanup metrics dashboard and fix bugs#3519
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates PredBat’s metrics pipeline and dashboard so the /metrics_dashboard view is cleaner and richer (power breakdown, currency-aware cost display), while also expanding per-service API instrumentation across multiple integrations.
Changes:
- Replace the old “rate” chart with a horizontal “power breakdown” chart (battery/grid/load/PV) and adjust dashboard layout + relative fetch path.
- Extend metrics exported via
/metrics/json(currency symbol, new instantaneous power gauges,api_servicesaggregation) and emit the new snapshot values fromPredBat. - Add
record_api_call(...)instrumentation to additional API clients/components (Temperature, Ohme, Carbon, Axle, AlertFeed, GeCloud, Octopus).
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| apps/predbat/web_metrics_dashboard.py | Dashboard layout tweaks; replaces rate chart with power breakdown; switches API table to use aggregated api_services; uses relative fetch for reverse-proxy friendliness. |
| apps/predbat/temperature.py | Adds API-call metrics instrumentation for Open-Meteo temperature fetches. |
| apps/predbat/predbat_metrics.py | Adds currency symbol and new power gauges; adds api_services aggregation and includes new fields in /metrics/json. |
| apps/predbat/predbat.py | Emits updated metrics snapshot values (correct unit conversion for charge/discharge rates, new power gauges, currency symbol, PredBat savings). |
| apps/predbat/ohme.py | Adds API-call instrumentation for success/failure paths and wraps request in connection-error handling. |
| apps/predbat/octopus.py | Improves API-call instrumentation granularity (octopus_url) and records success for GraphQL calls. |
| apps/predbat/gecloud.py | Adds API-call instrumentation on success and common failure modes. |
| apps/predbat/carbon.py | Adds API-call instrumentation for Carbon API success and error cases. |
| apps/predbat/axle.py | Adds API-call instrumentation for Axle API success and error cases. |
| apps/predbat/alertfeed.py | Adds API-call instrumentation for AlertFeed download success and error cases. |
| "cost_today": _val(self.cost_today), | ||
| "savings_today_pvbat": _val(self.savings_today_pvbat), | ||
| "savings_today_actual": _val(self.savings_today_actual), | ||
| # API (labeled) | ||
| "api_requests_total": _labeled(self.api_requests_total), | ||
| "api_failures_total": _labeled(self.api_failures_total), | ||
| "api_last_success_timestamp": _labeled(self.api_last_success_timestamp), | ||
| "savings_today_predbat": _val(self.savings_today_predbat), | ||
| # API (pre-aggregated per service - avoids JS key-parsing complexity) | ||
| "api_services": _api_services(self), |
There was a problem hiding this comment.
to_dict() no longer includes the previously-exposed labeled API metric maps (api_requests_total, api_failures_total, api_last_success_timestamp). Since /metrics/json is a public endpoint, dropping these keys is a breaking change for any external consumers; consider keeping the old keys alongside the new api_services aggregate (or versioning the JSON schema / documenting the change).
| self.cost_today = _gauge("predbat_cost_today", "Cost today in currency units") | ||
| self.savings_today_pvbat = _gauge("predbat_savings_today_pvbat", "PV/Battery system savings today") | ||
| self.savings_today_actual = _gauge("predbat_savings_today_actual", "Actual savings today") | ||
| self.savings_today_pvbat = _gauge("predbat_savings_today_pvbat", "PV/Battery system savings yesterday") | ||
| self.savings_today_actual = _gauge("predbat_savings_today_actual", "Actual cost yesterday") | ||
| self.savings_today_predbat = _gauge("predbat_savings_today_predbat", "PredBat savings yesterday") |
There was a problem hiding this comment.
The cost/savings metric help text and units are ambiguous: predbat_cost_today / savings values are stored in minor units (e.g., pence/cents) in the core calculations, but the dashboard displays them as major units by dividing by 100. To avoid confusion for Prometheus users, update these metric descriptions to explicitly state the unit (minor currency units) and/or the required scaling.
| @@ -155,6 +157,7 @@ async def fetch_temperature_data(self): | |||
| await asyncio.sleep(sleep_time) | |||
| else: | |||
| self.failures_total += 1 | |||
| record_api_call("temperature", False, "server_error") | |||
| return None | |||
| except (aiohttp.ClientError, asyncio.TimeoutError) as e: | |||
| if attempt < max_retries - 1: | |||
| @@ -164,10 +167,12 @@ async def fetch_temperature_data(self): | |||
| else: | |||
| self.log("Warn: TemperatureAPI: Request failed after {} attempts: {}".format(max_retries, e)) | |||
| self.failures_total += 1 | |||
| record_api_call("temperature", False, "connection_error") | |||
| return None | |||
| except Exception as e: | |||
| self.log("Warn: TemperatureAPI: Unexpected error fetching temperature data: {}".format(e)) | |||
| self.failures_total += 1 | |||
| record_api_call("temperature", False, "connection_error") | |||
There was a problem hiding this comment.
record_api_call("temperature") is only recorded after await response.json() succeeds, but JSON/content-type decode failures will currently fall into the generic except Exception path and be counted as connection_error. Consider catching aiohttp.ContentTypeError/json.JSONDecodeError around response.json() and recording decode_error (and reserving connection_error for transport/timeouts).
No description provided.