FishVision is a standalone Docker Compose monitoring stack (Prometheus, Grafana, Loki, Tempo, Alertmanager + IRC relay). It currently monitors 3 bare-metal nodes via node_exporter but has no awareness of the actual services running on them (factory Laravel app, web-services, etc.).
Goal: Adapt FishVision to be the central observability hub for all Wetfish projects, with factory as the top priority. Additionally, create Kubernetes manifests so FishVision can be deployed into K8s clusters.
Key constraint: Factory's RKE2 cluster has only ~365m CPU headroom on 2-vCPU nodes — too tight to run a full monitoring stack in-cluster. FishVision will monitor factory externally.
File: prometheus/prometheus.yml
- Add factory staging node (45.76.235.77:9100) and prod node (104.156.237.105:9100) as scrape targets
- Add MySQL exporter targets (requires exporter deployment — see 1.2)
- Add Redis exporter targets
- Add nginx/PHP-FPM status endpoints if exposed
Add lightweight metric exporters as sidecars in factory's K8s base manifests:
- mysql-exporter sidecar in
k8s/services/factory/k8s/base/mysql.yaml(~10m CPU, 32Mi RAM) - redis-exporter sidecar in
k8s/services/factory/k8s/base/redis.yaml(~10m CPU, 32Mi RAM) - nginx stub_status + php-fpm status in
k8s/services/factory/k8s/base/web.yaml(already has nginx, just enable status endpoint) - Update network policies in
k8s/infrastructure/network-policies/factory-allow.yamlto allow external scraping on exporter ports
Total additional resource cost: ~30m CPU, ~96Mi RAM (well within headroom)
File: prometheus/alert.rules.yml
- Add
FactoryMySQLDown— MySQL exporter unreachable - Add
FactoryRedisDown— Redis exporter unreachable - Add
FactoryAppDown— web pod unreachable - Add
FactoryHighMySQLConnections— connection count threshold - Add
FactoryHighRedisMemory— Redis memory threshold - Add
FactoryPodRestarts— Kubernetes pod restart count (via kube-state-metrics or custom)
Files: grafana/provisioning/datasources/datasources.yml and grafana/provisioning/dashboards/dashboards.yml
- Auto-provision Prometheus, Loki, Tempo datasources
- Factory dashboard JSON created (
grafana/dashboards/factory.json) - Andon alert observability dashboard added (
grafana/dashboards/andon-alert-observability.json)
File: prometheus/prometheus.yml
- The prod-node target (149.28.239.165:9100) already covers web-services infrastructure
- Add Traefik metrics endpoint if exposed
- web-services-k8s already has its own kube-prometheus-stack in-cluster
- Add Prometheus federation or remote_write from web-services-k8s Prometheus → FishVision Prometheus for centralized view
- Add scrape config for web-services-k8s Prometheus federation endpoint
File: prometheus/alert.rules.yml
- Add alert group for web-services targets
Create K8s manifests following factory's Kustomize pattern so FishVision can be deployed in K8s when resources allow.
Directory: k8s/base/
namespace.yaml—monitoringnamespaceprometheus.yaml— Deployment + ConfigMap + PVC + Servicealertmanager.yaml— Deployment + ConfigMap + PVC + Servicegrafana.yaml— Deployment + PVC + Service + provisioning ConfigMapsloki.yaml— Deployment + ConfigMap + PVC + Servicetempo.yaml— Deployment + ConfigMap + PVC + Serviceirc-relay.yaml— Deployment + ConfigMap + Serviceingress.yaml— Ingress for Grafana/Prometheus UIskustomization.yaml
Directories: k8s/overlays/{dev,staging,prod}/
- Environment-specific hostnames, storage classes, resource limits
- Image tag overrides
- Fix alert rule description mismatches (says "70%" but threshold is 80%/90%)
- Pin image versions (currently
prom/prometheus:latest) - Add Grafana provisioning instead of default admin/admin with no datasources
- Promtail added to
docker-compose.ymlcollecting container and host logs to Loki - Configuration at
promtail/promtail-config.yaml
| File | Action |
|---|---|
prometheus/prometheus.yml |
Edit — add factory + web-services scrape targets |
prometheus/alert.rules.yml |
Edit — add factory-specific + web-services alert rules, fix descriptions |
docker-compose.yml |
Edit — pin image versions, add Promtail service |
grafana/provisioning/datasources/datasources.yml |
Create — auto-provision datasources |
grafana/provisioning/dashboards/dashboards.yml |
Create — dashboard provider config |
grafana/dashboards/factory.json |
Create — factory dashboard |
k8s/base/*.yaml |
Create — Kubernetes base manifests |
k8s/overlays/{dev,staging,prod}/ |
Create — environment overlays |
| File | Action |
|---|---|
k8s/services/factory/k8s/base/mysql.yaml |
Edit — add mysql-exporter sidecar |
k8s/services/factory/k8s/base/redis.yaml |
Edit — add redis-exporter sidecar |
k8s/services/factory/k8s/base/web.yaml |
Edit — enable nginx stub_status |
k8s/infrastructure/network-policies/factory-allow.yaml |
Edit — allow metrics scraping |
docker compose up -d— all services healthy- Prometheus targets page (localhost:9090/targets) — all targets UP
- Grafana (localhost:3000) — datasources auto-provisioned, factory dashboard loads
- Trigger test alert — verify IRC relay receives it
- For factory changes: apply to staging cluster, verify exporters respond on metrics ports
kubectl apply -k k8s/overlays/dev/— verify K8s manifests are valid
- Phase 1.1 + 1.3 + 1.4 (FishVision configs — no cross-repo deps)
- Phase 4.1 (fix existing issues while we're in the configs)
- Phase 1.2 (factory repo changes — can be a separate PR)
- Phase 2 (web-services monitoring)
- Phase 3 (K8s manifests)
- Phase 4.2 (Promtail)