All the configuration and manifests for running my homelab on Kubernetes.
Internet → Scaleway Proxy (Pangolin + Gerbil + Traefik + CrowdSec) → Homelab K8s Cluster (Traefik Ingress)
Selected services are exposed to the Internet through a reverse proxy hosted on a Scaleway instance. Traffic is tunneled from the proxy to the cluster using Pangolin with WireGuard (Gerbil). Everything else stays on the local network, secured with mkcert-issued TLS certificates.
| Component | Technology |
|---|---|
| Container Orchestration | Kubernetes |
| Package Management | Helmfile + Helm |
| Storage | Rancher Local Path, OpenEBS ZFS-LocalPV (configs), NFS CSI (shared media) |
| Database | CloudNativePG (PostgreSQL) + Barman Cloud Plugin (backups to RustFS) |
| Ingress | Traefik |
| TLS | cert-manager (mkcert CA for local, Let's Encrypt for public) |
| GPU | Intel Device Plugins (iGPU/QSV) |
| Monitoring | kube-prometheus-stack |
| External Proxy | Pangolin + Gerbil + Traefik (Scaleway) |
| Security | CrowdSec (WAF + AppSec + host firewall bouncer) |
| Secret Management | Scaleway CLI (via vals ref+scw:// provider) |
| Backup | Restic (app configs to RustFS) + Barman Cloud Plugin (PostgreSQL) + Pangolin DB (Scaleway S3) + CrowdSec DB (Scaleway S3) |
| IaC | Terraform |
homelab/
├── kubernetes/
│ ├── apps/ # User-facing application workloads
│ ├── infra/ # Cluster-wide infrastructure services
│ └── cluster/ # Shared resources (namespaces, storage classes, certificates, CNPG definitions)
│
└── infra/
└── modules/
└── scaleway-proxy/ # Terraform for the external reverse proxy
kubernetes/apps/— Each subdirectory is a user-facing application deployed via Helmfile (e.g. Jellyfin, Sonarr, n8n).kubernetes/infra/— Cluster infrastructure services: storage backends, database operator, cert-manager, monitoring, GPU plugins. Also deployed via Helmfile.kubernetes/cluster/— Cluster-level definitions that don't belong to a single app: namespaces, storage classes, certificate issuers, CloudNativePG cluster and database CRs, persistent volumes, restic backups.infra/— Terraform modules for resources outside the cluster (currently the Scaleway reverse proxy).
- A Kubernetes cluster
- Helm + Helmfile + helm-git plugin
- kubectl
- direnv
- vals
- Terraform (only needed for the external reverse proxy)
This project uses direnv to manage environment variables. Helmfile relies on vals to inject these variables into Helm values at deploy time.
-
Install direnv following the official installation guide
-
Copy the example environment file:
cp .envrc.example .envrc- Edit
.envrcand fill in the required values:
vim .envrc- Allow direnv to load the environment:
direnv allow| Variable | Description | Example |
|---|---|---|
HOME_DIR |
Home directory of the user on the host machine | /home/user |
EMAIL_ADDRESS |
Your email address used for various services | user@example.com |
PUBLIC_DOMAIN_NAME |
Your homelab public domain name | example.com |
LOCAL_DOMAIN_NAME |
Your homelab internal domain name | home.arpa |
SINGLE_NODE_NAME |
Name of the single Kubernetes node | k8s-node |
PANGOLIN_PROXY_IP |
Public IP of the Scaleway proxy instance (for CoreDNS split-horizon) | 163.172.x.x |
TRAEFIK_CLUSTER_IP |
ClusterIP of the Traefik service (for CoreDNS split-horizon) | 10.43.x.x |
WAKATIME_API_KEY |
API key for WakaTime / Wakapi (used by wakatime-exporter) | (keep secure) |
PIHOLE_ADMIN_PASSWORD |
Admin password for Pi-hole web UI | (keep secure) |
JELLYFIN_API_KEY |
API key for Jellyfin (used by jellyfin-move script) | (keep secure) |
Once configured, these variables will be automatically loaded whenever you enter the project directory.
The homelab uses Scaleway Secret Manager via the vals tool's ref+scw:// provider. This eliminates hardcoded secrets in Helm charts and provides a centralized, auditable secret store.
For detailed Scaleway CLI usage and vals provider documentation, see the scaleway-secrets skill: skill scaleway-secrets
Apps
Generate an admin token to log in:
kubectl create token headlamp-admin -n kube-systemBoth apps share the same host directories for projects and worktrees. PVs will look at these locations:
$HOME_DIR/projects— project working directory$HOME_DIR/worktrees— git worktrees
Each app has its own config PVC (ZFS-backed) but shares the workdir and worktree PVs.
You will need to add your SSH keys for GitHub and your git config.
Enable the External Storage app to browse local media files (mounted at /media):
kubectl exec deployment/nextcloud -n nextcloud -c nextcloud -- php occ app:enable files_externalThen configure a Local mount in Settings → Administration → External storage pointing to /media.
Infrastructure Tools
The cluster uses three storage backends optimized for different use cases:
| StorageClass | Backend | Access Mode | Use Case |
|---|---|---|---|
local-path |
Rancher Local Path | RWO | App data |
zfs-vm-pool-dynamic |
OpenEBS ZFS-LocalPV | RWO | App configs, databases |
nfs-tank-media |
NFS CSI | RWX | Shared NFS storage (media, filebrowser) |
All storage uses Retain reclaim policy to prevent accidental data loss. You can use the kubectl plugin for openebs to help manage the storage volumes.
Rancher Local Path utilizes the local storage in each node.
OpenEBS ZFS-LocalPV provides high-performance local storage backed by ZFS on the vm-pool dataset.
- StorageClass:
zfs-vm-pool-dynamic - Features: Compression (lz4), snapshots, dynamic provisioning
- Location:
/vm-pool/<pvc-uuid>on single node
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-config
spec:
accessModes: [ReadWriteOnce]
storageClassName: zfs-vm-pool-dynamic
resources:
requests:
storage: 1GiReadWriteMany (RWX) volumes via an in-cluster NFS server, following the OpenEBS NFS provisioning pattern.
Pod → NFS CSI Driver → NFS Server Pod → hostPath (/mnt/tank/media or /mnt/tank/share)
Two NFS server instances export different ZFS datasets:
-
Media NFS Server (
nfs-server-media.nfs-server-media.svc.cluster.local): Exports/mnt/tank/media -
Share NFS Server (
nfs-server-share.nfs-server-share.svc.cluster.local): Exports/mnt/tank/share -
StorageClass:
nfs-tank-media(dynamic),nfs-media-library(static media),nfs-share-library(static share)
Static PVs use share subpaths (e.g. /, /photos/immich) to scope access per app.
Static PVs are defined in kubernetes/cluster/persistent-volumes/media.yaml.
Important: Any pod mounting an NFS media volume with fsGroup must set fsGroupChangePolicy: OnRootMismatch in podSecurityContext. Without it, Kubernetes performs a recursive ownership walk of the entire NFS share on every pod start, which fails on large media libraries. For pods where fsGroup differs from the NFS root owner (1000), use supplementalGroups instead of fsGroup.
All postgres databases run in a CloudNativePG cluster. The cluster uses dynamically provisioned ZFS-backed PVCs (zfs-vm-pool-dynamic).
Backups are handled by the Barman Cloud Plugin with WAL archiving and daily base backups to RustFS (cnpg-backups bucket). Configuration is in kubernetes/cluster/cloudnative-pg/backup.yaml.
You can use the kubectl plugin for cnpg to manage the cluster:
# Cluster status (includes backup info)
kubectl cnpg status cnpg-cluster0 -n cnpg-clusters
# List databases
kubectl get database -n cnpg-clusters
# Check backups
kubectl get backup -n cnpg-clusters
kubectl get scheduledbackup -n cnpg-clustersLocal network services use TLS certificates issued by a mkcert CA through cert-manager. The CA secret and ClusterIssuer are defined in kubernetes/cluster/certificates/.
To recreate the CA secret (e.g. after a fresh cluster install):
CERT_DIR=$(mkcert -CAROOT)
kubectl create secret tls mkcert-ca-key-pair -n cert-manager --cert=$CERT_DIR/rootCA.pem --key=$CERT_DIR/rootCA-key.pem
kubectl apply -f kubernetes/cluster/certificates/mkcert-ca-issuer.yamlAdding the following annotation to an ingress will make cert-manager automatically:
- Create a Certificate resource
- Issue a leaf cert signed by your mkcert CA
- Manage renewals automatically
- Maintain the TLS Secret
ingress:
enabled: true
className: traefik
annotations:
kubernetes.io/ingress.class: traefik
cert-manager.io/cluster-issuer: mkcert-caPublic-facing services get their TLS from Traefik (Let's Encrypt) on the Scaleway proxy via Pangolin — see External Access.
Intel iGPU is available for hardware transcoding (Jellyfin, Tdarr) via the Intel Device Plugins operator.
kubectl get gpudevicepluginsHost-level diagnostics:
intel_gpu_top(fromintel-gpu-tools) — live GPU usagevainfo(fromlibva-utils) — supported VAAPI profilesclinfo(fromclinfo) — OpenCL availability
More info here.
The cluster runs kube-prometheus-stack for monitoring and alerting. Configuration is in kubernetes/infra/kube-prometheus-stack/.
Dashboards are provisioned automatically via the Grafana Helm chart's dashboards section in kubernetes/infra/kube-prometheus-stack/values.yaml. The chart downloads the JSON from the upstream URL at deploy time and creates a ConfigMap that the Grafana sidecar picks up.
| Dashboard | Source | Datasource |
|---|---|---|
| qBittorrent | prometheus-qbittorrent-exporter | Prometheus |
| Wakapi | wakatime_exporter | Prometheus |
| PeaNUT | Grafana-for-PeaNUT | InfluxDB |
The PeaNUT dashboard requires an InfluxDB datasource, provisioned via a sidecar Secret in kubernetes/infra/kube-prometheus-stack/manifests/secret.yaml.
All app config PVCs are backed up weekly to RustFS using Restic. A single custom Helm chart (kubernetes/cluster/backups/charts/restic-backups/) manages all 21 backup CronJobs.
The orchestrated backup pattern scales down the app, runs a restic backup in an inner Job, and scales back up via an EXIT trap. All backups run on Sundays with staggered 5-minute intervals starting at 3:00 AM. Retention is 2 weekly snapshots.
cd kubernetes/cluster/backups
helmfile applyTo trigger a backup manually:
kubectl create job --from=cronjob/restic-backup-<app> manual-backup-<app> -n <namespace>To add a new app backup, add an entry to kubernetes/cluster/backups/values.yaml and redeploy.
The homelab is exposed to the Internet through Pangolin hosted on a Scaleway instance. The infrastructure is provisioned with Terraform (infra/modules/scaleway-proxy/) and the services run in Docker on the instance. Exposed resources are managed declaratively via a Pangolin blueprint.
Internet → Traefik (80/443) → Pangolin → Gerbil (WireGuard) → Newt (in-cluster) → K8s services
- Pangolin — Tunnel management platform with a web dashboard for configuring exposed services.
- Gerbil — WireGuard-based tunnel controller. Manages encrypted tunnels between the proxy and homelab.
- Traefik — Reverse proxy with automatic HTTPS via Let's Encrypt. Handles TLS termination and routing.
- CrowdSec — Collaborative behavior detection engine with WAF (AppSec), Traefik bouncer plugin, and host firewall bouncer for SSH protection.
- Newt — Tunnel client running in-cluster (
kubernetes/infra/pangolin-newt/). Connects to Gerbil and routes traffic to K8s services.
All externally-accessible services are defined in a blueprint file (kubernetes/infra/pangolin-newt/manifests/blueprint.yaml). The blueprint is a ConfigMap that Newt reads on startup, declaratively configuring domains, auth settings, and healthchecks.
- Public (no auth): jellyfin, jellyseerr, immich
- Admin-only (SSO whitelist): all other apps and infra tools
To add a new externally-accessible service, add a resource block to the blueprint (maintaining alphabetical order by resource key) and redeploy:
# Deploy blueprint + Newt together
cd kubernetes/infra/pangolin-newt
helmfile apply
# Or update the blueprint only
make blueprint-updateNote:
sso-rolescannot include "Admin" (reserved by Pangolin). Usewhitelist-userswith email addresses instead. The blueprint YAML lives inside a ConfigMap literal block — indentation errors are silently ignored, so always validate rendered output withvals eval.
Custom CoreDNS configuration (kubernetes/cluster/coredns/coredns-custom.yaml) implements split-horizon DNS so that in-cluster traffic to *.<public-domain> resolves to the internal Traefik ClusterIP instead of hairpinning through the Internet.
Exception: pangolin.<public-domain> resolves to the Scaleway proxy IP so the Newt tunnel client can reach Pangolin directly (otherwise it would get Traefik's IP via the wildcard rule and fail TLS handshake).
vals eval -f kubernetes/cluster/coredns/coredns-custom.yaml | kubectl apply -f -
kubectl rollout restart deployment coredns -n kube-systemRequires environment variables: PANGOLIN_PROXY_IP and TRAEFIK_CLUSTER_IP (see Environment Variables).
You'll need a Scaleway config file.
cd ./infra/modules/scaleway-proxy
cp ./terraform.tfvars.example ./terraform.tfvars
# edit the file with the relevant values
# generate pangolin_secret with: openssl rand -base64 48
terraform init
terraform plan
terraform applyOnce the instance is ready, run:
cd "$HOME_DIR/pangolin"
docker compose up -dThen navigate to https://pangolin.<domain>/auth/initial-setup to complete the initial setup. The setup token is printed in the Pangolin container logs (docker compose logs pangolin).
After Terraform apply, also deploy the CoreDNS config (see above) and the Newt blueprint to complete the external access setup.
The Scaleway proxy runs CrowdSec for multi-layer security, deployed automatically via Terraform/cloud-init:
- Traefik bouncer plugin — Inspects all incoming HTTP/HTTPS requests via the
crowdsec-bouncer-traefik-plugin. Banned IPs get a ban page. AppSec (virtual patching) is enabled for WAF-level protection. - Syslog monitoring — CrowdSec reads
/var/log/auth.logand/var/log/syslogto detect SSH brute-force and other host-level attacks. - Host firewall bouncer —
crowdsec-firewall-bouncer-iptablesruns on the host and adds iptables rules to DROP banned IPs at the network level (covers SSH and any non-HTTP traffic).
Bouncer API keys are pre-registered via BOUNCER_KEY_* environment variables in the CrowdSec container, so no manual key generation is needed after deploy.
Useful commands (SSH into the proxy):
docker exec crowdsec cscli metrics # View bouncer stats
docker exec crowdsec cscli decisions list # List active bans
docker exec crowdsec cscli decisions add --ip <IP> -d 1m --type ban # Manually ban an IP (1 min)
systemctl status crowdsec-firewall-bouncer # Host bouncer statusPangolin's PostgreSQL database is backed up weekly to a Scaleway Object Storage bucket (<instance_name>-pangolin-backups), managed via Terraform. Backups are retained for 30 days (S3 lifecycle rule).
Cron (Sunday 3am) → backup-db.sh → docker exec pg_dump → gzip → aws s3 cp → Scaleway S3
The backup script and cron job are deployed via cloud-init. Credentials reuse the same Scaleway API keys as Traefik's DNS challenge.
Manual backup (SSH into the proxy):
/home/<username>/pangolin/backup-db.shRestore on a new instance — after terraform apply + docker compose up -d:
# List available backups
export AWS_ACCESS_KEY_ID="<scaleway_access_key>"
export AWS_SECRET_ACCESS_KEY="<scaleway_secret_key>"
aws s3 ls s3://<instance_name>-pangolin-backups/pangolin/ --endpoint-url https://s3.fr-par.scw.cloud
# Download the latest dump
aws s3 cp s3://<instance_name>-pangolin-backups/pangolin/pangolin-db-<timestamp>.sql.gz /tmp/ --endpoint-url https://s3.fr-par.scw.cloud
# Stop Pangolin (keep Postgres running)
docker stop pangolin gerbil traefik crowdsec
# Drop and recreate the database (required — restoring into an existing DB causes FK/PK conflicts)
docker exec postgres psql -U <pangolin_pg_user> -d postgres -c "DROP DATABASE pangolin;"
docker exec postgres psql -U <pangolin_pg_user> -d postgres -c "CREATE DATABASE pangolin OWNER <pangolin_pg_user>;"
# Restore
gunzip -c /tmp/pangolin-db-<timestamp>.sql.gz | docker exec -i postgres psql -U <pangolin_pg_user> -d pangolin
# Fix Gerbil public key — the new instance generates a fresh WireGuard keypair,
# but the restored DB still has the old exit node's public key.
# Derive the current public key from gerbil's private key and update the DB:
NEW_PUBKEY=$(cat ~/pangolin/config/key | wg pubkey)
docker exec postgres psql -U <pangolin_pg_user> -d pangolin \
-c "UPDATE \"exitNodes\" SET \"publicKey\" = '$NEW_PUBKEY' WHERE \"exitNodeId\" = 1;"
# Restart all services
docker compose -f ~/pangolin/compose.yaml down && docker compose -f ~/pangolin/compose.yaml up -dWhy the Gerbil key fix is needed: Gerbil generates and saves its WireGuard private key at ~/pangolin/config/key on first start (--generateAndSaveKeyTo). On a new instance, this key is different from the one stored in the backup. Gerbil sends its public key to Pangolin's config API, but Pangolin matches exit nodes by public key — if it doesn't match, Gerbil receives an empty config with no CIDR address, causing a crash loop. Traefik also fails because it shares Gerbil's network namespace (network_mode: service:gerbil).
Check backup logs: cat /var/log/pangolin-backup.log
CrowdSec's SQLite database and Web UI data are backed up weekly to the same Scaleway S3 bucket under a crowdsec/ prefix. The script briefly stops the containers for a consistent snapshot, creates compressed archives, uploads to S3, and restarts via an EXIT trap.
Cron (Sunday 3am) → backup-crowdsec.sh → docker stop → tar + gzip → aws s3 cp → docker start → Scaleway S3
Manual backup (SSH into the proxy):
/home/<username>/pangolin/backup-crowdsec.shCheck backup logs: cat /var/log/crowdsec-backup.log
cd ./infra/modules/scaleway-proxy
terraform destroyNote: Scaleway allows duplicate DNS records. After
terraform destroyand a freshterraform apply, check for stale A records (apex@and wildcard*) pointing to the old instance IP. Duplicates will cause ACME certificate issuance to fail intermittently. Remove them manually from the Scaleway console.
The scripts/ directory contains utility scripts for managing the homelab cluster.
The jellyfin/ package provides a script to move completed downloads from qBittorrent to the organized media library while preserving manually identified metadata (TMDB/IMDB IDs).
make jellyfin-move # Move files and update Jellyfin
make jellyfin-move-dry-run # Preview moves without executingThe script moves files via kubectl exec in the Jellyfin pod, then uses the Jellyfin API to:
- Capture ProviderIds (TMDB/IMDB) from the old item before moving
- Delete the old library entry to prevent duplicates
- Trigger a library scan and wait for the new item to appear
- Copy the ProviderIds to the new item, preserving manual identifications
Files are moved from /media/downloads/<category>/ to /media/videos/<category>/ (movies, shows, documentaries, stand-up, tv-programs, tv-shows).
The warden/ package provides Kubernetes-to-Warden monitoring integration with automatic discovery, deployment verification, and cleanup capabilities.
Quick Start:
make warden-seed # Seed app monitors
make warden-seed-cleanup # Seed with cleanup
make warden-delete # Delete all monitors
make warden-compare # Compare K8s vs WardenSee scripts/warden/README.md for detailed documentation.
Update Jellyfin (helm release + rollout restart):
make jellyfin-updateUpdate Pangolin blueprint (apply ConfigMap + restart Newt):
make blueprint-updateGenerate a strong password:
pwgen -scyn 32 1
or
openssl rand -base64 32- Some charts are inspired by rtomik's helm-charts repository