Skip to content

VDuchauffour/homelab

Repository files navigation

An opinionated Kubernetes-based homelab

All the configuration and manifests for running my homelab on Kubernetes.

Architecture Overview

Internet → Scaleway Proxy (Pangolin + Gerbil + Traefik + CrowdSec) → Homelab K8s Cluster (Traefik Ingress)

Selected services are exposed to the Internet through a reverse proxy hosted on a Scaleway instance. Traffic is tunneled from the proxy to the cluster using Pangolin with WireGuard (Gerbil). Everything else stays on the local network, secured with mkcert-issued TLS certificates.

Key Technologies

Component Technology
Container Orchestration Kubernetes
Package Management Helmfile + Helm
Storage Rancher Local Path, OpenEBS ZFS-LocalPV (configs), NFS CSI (shared media)
Database CloudNativePG (PostgreSQL) + Barman Cloud Plugin (backups to RustFS)
Ingress Traefik
TLS cert-manager (mkcert CA for local, Let's Encrypt for public)
GPU Intel Device Plugins (iGPU/QSV)
Monitoring kube-prometheus-stack
External Proxy Pangolin + Gerbil + Traefik (Scaleway)
Security CrowdSec (WAF + AppSec + host firewall bouncer)
Secret Management Scaleway CLI (via vals ref+scw:// provider)
Backup Restic (app configs to RustFS) + Barman Cloud Plugin (PostgreSQL) + Pangolin DB (Scaleway S3) + CrowdSec DB (Scaleway S3)
IaC Terraform

Project Structure

homelab/
├── kubernetes/
│   ├── apps/         # User-facing application workloads
│   ├── infra/        # Cluster-wide infrastructure services
│   └── cluster/      # Shared resources (namespaces, storage classes, certificates, CNPG definitions)
│
└── infra/
    └── modules/
        └── scaleway-proxy/   # Terraform for the external reverse proxy
  • kubernetes/apps/ — Each subdirectory is a user-facing application deployed via Helmfile (e.g. Jellyfin, Sonarr, n8n).
  • kubernetes/infra/ — Cluster infrastructure services: storage backends, database operator, cert-manager, monitoring, GPU plugins. Also deployed via Helmfile.
  • kubernetes/cluster/ — Cluster-level definitions that don't belong to a single app: namespaces, storage classes, certificate issuers, CloudNativePG cluster and database CRs, persistent volumes, restic backups.
  • infra/ — Terraform modules for resources outside the cluster (currently the Scaleway reverse proxy).

Getting Started

Prerequisites

Environment Configuration

This project uses direnv to manage environment variables. Helmfile relies on vals to inject these variables into Helm values at deploy time.

  1. Install direnv following the official installation guide

  2. Copy the example environment file:

cp .envrc.example .envrc
  1. Edit .envrc and fill in the required values:
vim .envrc
  1. Allow direnv to load the environment:
direnv allow

Environment Variables

Variable Description Example
HOME_DIR Home directory of the user on the host machine /home/user
EMAIL_ADDRESS Your email address used for various services user@example.com
PUBLIC_DOMAIN_NAME Your homelab public domain name example.com
LOCAL_DOMAIN_NAME Your homelab internal domain name home.arpa
SINGLE_NODE_NAME Name of the single Kubernetes node k8s-node
PANGOLIN_PROXY_IP Public IP of the Scaleway proxy instance (for CoreDNS split-horizon) 163.172.x.x
TRAEFIK_CLUSTER_IP ClusterIP of the Traefik service (for CoreDNS split-horizon) 10.43.x.x
WAKATIME_API_KEY API key for WakaTime / Wakapi (used by wakatime-exporter) (keep secure)
PIHOLE_ADMIN_PASSWORD Admin password for Pi-hole web UI (keep secure)
JELLYFIN_API_KEY API key for Jellyfin (used by jellyfin-move script) (keep secure)

Once configured, these variables will be automatically loaded whenever you enter the project directory.

Secret Management (Scaleway CLI + vals)

The homelab uses Scaleway Secret Manager via the vals tool's ref+scw:// provider. This eliminates hardcoded secrets in Helm charts and provides a centralized, auditable secret store.

For detailed Scaleway CLI usage and vals provider documentation, see the scaleway-secrets skill: skill scaleway-secrets

Kubernetes — Applications

Available Apps

Apps
App Description Deployed via
autoscan Webhook-based media scanner bridge for Sonarr/Radarr/Lidarr to Jellyfin Helmfile
backrest Web UI for restic backup management Helmfile
bazarr Subtitle management for media Helmfile
bentopdf Privacy-first PDF toolkit with client-side processing Helmfile
changedetection-io Website change detection and monitoring Helmfile
dozzle Real-time log viewer for containers Helmfile
filebrowser Web-based file manager Helmfile
flood Modern web UI for qBittorrent Helmfile
dynacat Personal dashboard Helmfile
headlamp Kubernetes web UI Helmfile
home-assistant Home automation platform Helmfile
helm-dashboard Helm charts management UI Helmfile
immich Self-hosted photo and video management Helmfile
it-tools Collection of handy online tools for developers Helmfile
jellyfin Media server with live TV Helmfile
jellyseerr Media request management Helmfile
kan Kanban board Helmfile
lidarr Music collection manager Helmfile
linkding Minimal bookmark manager Helmfile
linkwarden Collaborative bookmark manager Helmfile
memos Lightweight self-hosted memo hub Helmfile
medialyze Self-hosted media library analyzer Helmfile
mlflow ML experiment tracking and model registry Helmfile
music-assistant Personal music streaming hub with multi-room player management Helmfile
musicbrainz-picard Cross-platform music tagger powered by MusicBrainz Helmfile
n8n Workflow automation platform Helmfile
navidrome Web-based music collection server and streamer Helmfile
gotify Simple server for sending and receiving messages Helmfile
papra Self-hosted document management with OCR Helmfile
pgadmin PostgreSQL management UI Helmfile
peanut Dashboard for Network UPS Tools (NUT) monitoring Helmfile
privatebin Encrypted pastebin for sharing secrets Helmfile
prowlarr Indexer manager for arr suite Helmfile
protonmail-bridge ProtonMail SMTP/IMAP bridge for local mail relay Helmfile
qbittorrent-vpn BitTorrent client with VPN Helmfile
radarr Movie collection manager Helmfile
sonarr TV show collection manager Helmfile
scrutiny Hard drive S.M.A.R.T health monitoring Helmfile
collabora Collabora Online document editor for Nextcloud Helmfile
nextcloud Cloud storage and collaboration with Collabora Online integration Helmfile
seafile File management and sharing Helmfile
slskd Soulseek client for music sharing Helmfile
stirling-pdf PDF manipulation toolkit Helmfile
tdarr Media transcoding optimizer Helmfile
tracearr Real-time monitoring for Plex, Jellyfin, and Emby Helmfile
uptime-kuma Uptime monitoring dashboard Helmfile
wakapi Self-hosted WakaTime-compatible coding statistics Helmfile
wallos Personal subscription and expense tracker Helmfile
warden Service health monitoring and alerting Helmfile
zfdash ZFS monitoring dashboard Helmfile

App-Specific Notes

Headlamp

Generate an admin token to log in:

kubectl create token headlamp-admin -n kube-system

OpenCode & OpenChamber

Both apps share the same host directories for projects and worktrees. PVs will look at these locations:

  • $HOME_DIR/projects — project working directory
  • $HOME_DIR/worktrees — git worktrees

Each app has its own config PVC (ZFS-backed) but shares the workdir and worktree PVs.

You will need to add your SSH keys for GitHub and your git config.

Nextcloud

Enable the External Storage app to browse local media files (mounted at /media):

kubectl exec deployment/nextcloud -n nextcloud -c nextcloud -- php occ app:enable files_external

Then configure a Local mount in Settings → Administration → External storage pointing to /media.

Kubernetes — Infrastructure

Available Infrastructure tools

Infrastructure Tools
Tool Description Deployed via
cert-manager TLS certificate automation Helmfile
cloudnative-pg PostgreSQL operator for K8s Helmfile
intel-device-plugins GPU and device plugin for Intel hardware Helmfile
kube-prometheus-stack Monitoring and alerting stack Helmfile
local-path-provisioner Local hostpath storage provisioner Helmfile
rustfs High-performance S3-compatible object storage Helmfile
nfs-csi-driver NFS CSI driver for RWX volumes Helmfile
nfs-server In-cluster NFS servers for shared media and files Helmfile
node-feature-discovery Hardware feature discovery Helmfile
openebs Container-native storage solution Helmfile
pangolin-newt Newt tunnel client for external access via Pangolin Helmfile
pihole Network-wide ad blocking DNS sinkhole Helmfile
plugin-barman-cloud Backup plugin for CloudNativePG (WAL archiving + base backups to RustFS) Helmfile
nut-exporter Prometheus exporter for Network UPS Tools (NUT) metrics Helmfile
qbittorrent-exporter Prometheus exporter for qBittorrent metrics Helmfile
smartctl-exporter Prometheus exporter for disk S.M.A.R.T. health metrics Helmfile
traefik Ingress controller and reverse proxy Helmfile
wakatime-exporter Prometheus exporter for WakaTime / Wakapi coding statistics Helmfile

Storage

The cluster uses three storage backends optimized for different use cases:

StorageClass Backend Access Mode Use Case
local-path Rancher Local Path RWO App data
zfs-vm-pool-dynamic OpenEBS ZFS-LocalPV RWO App configs, databases
nfs-tank-media NFS CSI RWX Shared NFS storage (media, filebrowser)

All storage uses Retain reclaim policy to prevent accidental data loss. You can use the kubectl plugin for openebs to help manage the storage volumes.

Local Path

Rancher Local Path utilizes the local storage in each node.

ZFS-LocalPV (App Configs)

OpenEBS ZFS-LocalPV provides high-performance local storage backed by ZFS on the vm-pool dataset.

  • StorageClass: zfs-vm-pool-dynamic
  • Features: Compression (lz4), snapshots, dynamic provisioning
  • Location: /vm-pool/<pvc-uuid> on single node
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: app-config
spec:
  accessModes: [ReadWriteOnce]
  storageClassName: zfs-vm-pool-dynamic
  resources:
    requests:
      storage: 1Gi

NFS Storage (Shared Tank)

ReadWriteMany (RWX) volumes via an in-cluster NFS server, following the OpenEBS NFS provisioning pattern.

Pod → NFS CSI Driver → NFS Server Pod → hostPath (/mnt/tank/media or /mnt/tank/share)

Two NFS server instances export different ZFS datasets:

  • Media NFS Server (nfs-server-media.nfs-server-media.svc.cluster.local): Exports /mnt/tank/media

  • Share NFS Server (nfs-server-share.nfs-server-share.svc.cluster.local): Exports /mnt/tank/share

  • StorageClass: nfs-tank-media (dynamic), nfs-media-library (static media), nfs-share-library (static share)

Static PVs use share subpaths (e.g. /, /photos/immich) to scope access per app.

Static PVs are defined in kubernetes/cluster/persistent-volumes/media.yaml.

Important: Any pod mounting an NFS media volume with fsGroup must set fsGroupChangePolicy: OnRootMismatch in podSecurityContext. Without it, Kubernetes performs a recursive ownership walk of the entire NFS share on every pod start, which fails on large media libraries. For pods where fsGroup differs from the NFS root owner (1000), use supplementalGroups instead of fsGroup.

Database (PostgreSQL)

All postgres databases run in a CloudNativePG cluster. The cluster uses dynamically provisioned ZFS-backed PVCs (zfs-vm-pool-dynamic).

Backups are handled by the Barman Cloud Plugin with WAL archiving and daily base backups to RustFS (cnpg-backups bucket). Configuration is in kubernetes/cluster/cloudnative-pg/backup.yaml.

You can use the kubectl plugin for cnpg to manage the cluster:

# Cluster status (includes backup info)
kubectl cnpg status cnpg-cluster0 -n cnpg-clusters

# List databases
kubectl get database -n cnpg-clusters

# Check backups
kubectl get backup -n cnpg-clusters
kubectl get scheduledbackup -n cnpg-clusters

TLS Certificates

Local network services use TLS certificates issued by a mkcert CA through cert-manager. The CA secret and ClusterIssuer are defined in kubernetes/cluster/certificates/.

To recreate the CA secret (e.g. after a fresh cluster install):

CERT_DIR=$(mkcert -CAROOT)
kubectl create secret tls mkcert-ca-key-pair -n cert-manager --cert=$CERT_DIR/rootCA.pem --key=$CERT_DIR/rootCA-key.pem
kubectl apply -f kubernetes/cluster/certificates/mkcert-ca-issuer.yaml

Adding the following annotation to an ingress will make cert-manager automatically:

  • Create a Certificate resource
  • Issue a leaf cert signed by your mkcert CA
  • Manage renewals automatically
  • Maintain the TLS Secret
ingress:
  enabled: true
  className: traefik
  annotations:
    kubernetes.io/ingress.class: traefik
    cert-manager.io/cluster-issuer: mkcert-ca

Public-facing services get their TLS from Traefik (Let's Encrypt) on the Scaleway proxy via Pangolin — see External Access.

GPU

Intel iGPU is available for hardware transcoding (Jellyfin, Tdarr) via the Intel Device Plugins operator.

kubectl get gpudeviceplugins

Host-level diagnostics:

  • intel_gpu_top (from intel-gpu-tools) — live GPU usage
  • vainfo (from libva-utils) — supported VAAPI profiles
  • clinfo (from clinfo) — OpenCL availability

More info here.

Monitoring

The cluster runs kube-prometheus-stack for monitoring and alerting. Configuration is in kubernetes/infra/kube-prometheus-stack/.

Grafana Dashboards

Dashboards are provisioned automatically via the Grafana Helm chart's dashboards section in kubernetes/infra/kube-prometheus-stack/values.yaml. The chart downloads the JSON from the upstream URL at deploy time and creates a ConfigMap that the Grafana sidecar picks up.

Dashboard Source Datasource
qBittorrent prometheus-qbittorrent-exporter Prometheus
Wakapi wakatime_exporter Prometheus
PeaNUT Grafana-for-PeaNUT InfluxDB

The PeaNUT dashboard requires an InfluxDB datasource, provisioned via a sidecar Secret in kubernetes/infra/kube-prometheus-stack/manifests/secret.yaml.

Backups (Restic)

All app config PVCs are backed up weekly to RustFS using Restic. A single custom Helm chart (kubernetes/cluster/backups/charts/restic-backups/) manages all 21 backup CronJobs.

The orchestrated backup pattern scales down the app, runs a restic backup in an inner Job, and scales back up via an EXIT trap. All backups run on Sundays with staggered 5-minute intervals starting at 3:00 AM. Retention is 2 weekly snapshots.

cd kubernetes/cluster/backups
helmfile apply

To trigger a backup manually:

kubectl create job --from=cronjob/restic-backup-<app> manual-backup-<app> -n <namespace>

To add a new app backup, add an entry to kubernetes/cluster/backups/values.yaml and redeploy.

External Access

The homelab is exposed to the Internet through Pangolin hosted on a Scaleway instance. The infrastructure is provisioned with Terraform (infra/modules/scaleway-proxy/) and the services run in Docker on the instance. Exposed resources are managed declaratively via a Pangolin blueprint.

Architecture

Internet → Traefik (80/443) → Pangolin → Gerbil (WireGuard) → Newt (in-cluster) → K8s services

Components

  • Pangolin — Tunnel management platform with a web dashboard for configuring exposed services.
  • Gerbil — WireGuard-based tunnel controller. Manages encrypted tunnels between the proxy and homelab.
  • Traefik — Reverse proxy with automatic HTTPS via Let's Encrypt. Handles TLS termination and routing.
  • CrowdSec — Collaborative behavior detection engine with WAF (AppSec), Traefik bouncer plugin, and host firewall bouncer for SSH protection.
  • Newt — Tunnel client running in-cluster (kubernetes/infra/pangolin-newt/). Connects to Gerbil and routes traffic to K8s services.

Blueprint (Declarative Resource Config)

All externally-accessible services are defined in a blueprint file (kubernetes/infra/pangolin-newt/manifests/blueprint.yaml). The blueprint is a ConfigMap that Newt reads on startup, declaratively configuring domains, auth settings, and healthchecks.

  • Public (no auth): jellyfin, jellyseerr, immich
  • Admin-only (SSO whitelist): all other apps and infra tools

To add a new externally-accessible service, add a resource block to the blueprint (maintaining alphabetical order by resource key) and redeploy:

# Deploy blueprint + Newt together
cd kubernetes/infra/pangolin-newt
helmfile apply

# Or update the blueprint only
make blueprint-update

Note: sso-roles cannot include "Admin" (reserved by Pangolin). Use whitelist-users with email addresses instead. The blueprint YAML lives inside a ConfigMap literal block — indentation errors are silently ignored, so always validate rendered output with vals eval.

CoreDNS (Split-Horizon DNS)

Custom CoreDNS configuration (kubernetes/cluster/coredns/coredns-custom.yaml) implements split-horizon DNS so that in-cluster traffic to *.<public-domain> resolves to the internal Traefik ClusterIP instead of hairpinning through the Internet.

Exception: pangolin.<public-domain> resolves to the Scaleway proxy IP so the Newt tunnel client can reach Pangolin directly (otherwise it would get Traefik's IP via the wildcard rule and fail TLS handshake).

vals eval -f kubernetes/cluster/coredns/coredns-custom.yaml | kubectl apply -f -
kubectl rollout restart deployment coredns -n kube-system

Requires environment variables: PANGOLIN_PROXY_IP and TRAEFIK_CLUSTER_IP (see Environment Variables).

Scaleway Proxy (Terraform)

You'll need a Scaleway config file.

cd ./infra/modules/scaleway-proxy

cp ./terraform.tfvars.example ./terraform.tfvars
# edit the file with the relevant values
# generate pangolin_secret with: openssl rand -base64 48

terraform init
terraform plan
terraform apply

Once the instance is ready, run:

cd "$HOME_DIR/pangolin"
docker compose up -d

Then navigate to https://pangolin.<domain>/auth/initial-setup to complete the initial setup. The setup token is printed in the Pangolin container logs (docker compose logs pangolin).

After Terraform apply, also deploy the CoreDNS config (see above) and the Newt blueprint to complete the external access setup.

CrowdSec (WAF + Host Protection)

The Scaleway proxy runs CrowdSec for multi-layer security, deployed automatically via Terraform/cloud-init:

  • Traefik bouncer plugin — Inspects all incoming HTTP/HTTPS requests via the crowdsec-bouncer-traefik-plugin. Banned IPs get a ban page. AppSec (virtual patching) is enabled for WAF-level protection.
  • Syslog monitoring — CrowdSec reads /var/log/auth.log and /var/log/syslog to detect SSH brute-force and other host-level attacks.
  • Host firewall bouncercrowdsec-firewall-bouncer-iptables runs on the host and adds iptables rules to DROP banned IPs at the network level (covers SSH and any non-HTTP traffic).

Bouncer API keys are pre-registered via BOUNCER_KEY_* environment variables in the CrowdSec container, so no manual key generation is needed after deploy.

Useful commands (SSH into the proxy):

docker exec crowdsec cscli metrics                              # View bouncer stats
docker exec crowdsec cscli decisions list                       # List active bans
docker exec crowdsec cscli decisions add --ip <IP> -d 1m --type ban  # Manually ban an IP (1 min)
systemctl status crowdsec-firewall-bouncer                      # Host bouncer status

Pangolin DB Backup

Pangolin's PostgreSQL database is backed up weekly to a Scaleway Object Storage bucket (<instance_name>-pangolin-backups), managed via Terraform. Backups are retained for 30 days (S3 lifecycle rule).

Cron (Sunday 3am) → backup-db.sh → docker exec pg_dump → gzip → aws s3 cp → Scaleway S3

The backup script and cron job are deployed via cloud-init. Credentials reuse the same Scaleway API keys as Traefik's DNS challenge.

Manual backup (SSH into the proxy):

/home/<username>/pangolin/backup-db.sh

Restore on a new instance — after terraform apply + docker compose up -d:

# List available backups
export AWS_ACCESS_KEY_ID="<scaleway_access_key>"
export AWS_SECRET_ACCESS_KEY="<scaleway_secret_key>"
aws s3 ls s3://<instance_name>-pangolin-backups/pangolin/ --endpoint-url https://s3.fr-par.scw.cloud

# Download the latest dump
aws s3 cp s3://<instance_name>-pangolin-backups/pangolin/pangolin-db-<timestamp>.sql.gz /tmp/ --endpoint-url https://s3.fr-par.scw.cloud

# Stop Pangolin (keep Postgres running)
docker stop pangolin gerbil traefik crowdsec

# Drop and recreate the database (required — restoring into an existing DB causes FK/PK conflicts)
docker exec postgres psql -U <pangolin_pg_user> -d postgres -c "DROP DATABASE pangolin;"
docker exec postgres psql -U <pangolin_pg_user> -d postgres -c "CREATE DATABASE pangolin OWNER <pangolin_pg_user>;"

# Restore
gunzip -c /tmp/pangolin-db-<timestamp>.sql.gz | docker exec -i postgres psql -U <pangolin_pg_user> -d pangolin

# Fix Gerbil public key — the new instance generates a fresh WireGuard keypair,
# but the restored DB still has the old exit node's public key.
# Derive the current public key from gerbil's private key and update the DB:
NEW_PUBKEY=$(cat ~/pangolin/config/key | wg pubkey)
docker exec postgres psql -U <pangolin_pg_user> -d pangolin \
  -c "UPDATE \"exitNodes\" SET \"publicKey\" = '$NEW_PUBKEY' WHERE \"exitNodeId\" = 1;"

# Restart all services
docker compose -f ~/pangolin/compose.yaml down && docker compose -f ~/pangolin/compose.yaml up -d

Why the Gerbil key fix is needed: Gerbil generates and saves its WireGuard private key at ~/pangolin/config/key on first start (--generateAndSaveKeyTo). On a new instance, this key is different from the one stored in the backup. Gerbil sends its public key to Pangolin's config API, but Pangolin matches exit nodes by public key — if it doesn't match, Gerbil receives an empty config with no CIDR address, causing a crash loop. Traefik also fails because it shares Gerbil's network namespace (network_mode: service:gerbil).

Check backup logs: cat /var/log/pangolin-backup.log

CrowdSec DB Backup

CrowdSec's SQLite database and Web UI data are backed up weekly to the same Scaleway S3 bucket under a crowdsec/ prefix. The script briefly stops the containers for a consistent snapshot, creates compressed archives, uploads to S3, and restarts via an EXIT trap.

Cron (Sunday 3am) → backup-crowdsec.sh → docker stop → tar + gzip → aws s3 cp → docker start → Scaleway S3

Manual backup (SSH into the proxy):

/home/<username>/pangolin/backup-crowdsec.sh

Check backup logs: cat /var/log/crowdsec-backup.log

Destroy

cd ./infra/modules/scaleway-proxy
terraform destroy

Note: Scaleway allows duplicate DNS records. After terraform destroy and a fresh terraform apply, check for stale A records (apex @ and wildcard *) pointing to the old instance IP. Duplicates will cause ACME certificate issuance to fail intermittently. Remove them manually from the Scaleway console.

Utility Scripts

The scripts/ directory contains utility scripts for managing the homelab cluster.

Media Library Management

The jellyfin/ package provides a script to move completed downloads from qBittorrent to the organized media library while preserving manually identified metadata (TMDB/IMDB IDs).

make jellyfin-move           # Move files and update Jellyfin
make jellyfin-move-dry-run   # Preview moves without executing

The script moves files via kubectl exec in the Jellyfin pod, then uses the Jellyfin API to:

  1. Capture ProviderIds (TMDB/IMDB) from the old item before moving
  2. Delete the old library entry to prevent duplicates
  3. Trigger a library scan and wait for the new item to appear
  4. Copy the ProviderIds to the new item, preserving manual identifications

Files are moved from /media/downloads/<category>/ to /media/videos/<category>/ (movies, shows, documentaries, stand-up, tv-programs, tv-shows).

Warden Monitoring Tools

The warden/ package provides Kubernetes-to-Warden monitoring integration with automatic discovery, deployment verification, and cleanup capabilities.

Quick Start:

make warden-seed          # Seed app monitors
make warden-seed-cleanup  # Seed with cleanup
make warden-delete        # Delete all monitors
make warden-compare       # Compare K8s vs Warden

See scripts/warden/README.md for detailed documentation.

Useful Commands

Update Jellyfin (helm release + rollout restart):

make jellyfin-update

Update Pangolin blueprint (apply ConfigMap + restart Newt):

make blueprint-update

Generate a strong password:

pwgen -scyn 32 1
or
openssl rand -base64 32

Acknowledgments

About

An opinionated Kubernetes-based homelab

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors