Self-hosting and managing your setup through GitOps. That is the journey I've started, using @onedr0p's excellent cluster template.
This repo uses Talos Linux and Flux to fully declaratively manage a Kubernetes cluster at home.
Note
For an in-depth write-up of my current homelab hardware, check out my blog at fhoekstra.eu
| Role | Model | CPU | RAM | SSD |
|---|---|---|---|---|
| K8s controlplane, k8s workloads | kube (Rock 5B+) |
RK3588 (4x A76 + 4x A55) | 24GB LPDDR5 | 32GB microSD for read-only root and boot, PLP SSDs for Talos and Ceph |
| K8s controlplane, k8s workloads | kube (Rock 5B+) |
RK3588 (4x A76 + 4x A55) | 24GB LPDDR5 | 32GB microSD for read-only root and boot, PLP SSDs for Talos and Ceph |
| K8s controlplane, k8s workloads | kube (Rock 5B+) |
RK3588 (4x A76 + 4x A55) | 24GB LPDDR5 | 32GB microSD for read-only root and boot, PLP SSDs for Talos and Ceph |
| Cluster-external NFS (backups) | Raspberry Pi 4B | 4GB | 1TB SATA-via-USB3 Samsung QLC 870 Evo |
- Cloudflare external ingress
- k8s_gateway for exposing DNS to LAN
- Authelia + lldap auth stack for external users
- Tailscale for remote private access
- Data:
- CloudnativePG for relational databases
- Rook-ceph for cluster internal storage
- volsync for backup to and automatic restore from NFS and/or OVHCloud object storage
- versitygw for access to NFS via S3-compatible interface (for database backups)
- Observability:
- VictoriaMetrics
- Grafana
- VictoriaLogs
- Karakeep bookmark manager and RSS reader
- FoundryVTT with an integrated SFTPGo container for remote filesystem access for the game admin
- OwnCloud Instant Scale (OCIS) for cloudnative filebrowser/sharing
- Continuwuity Matrix server
If the cluster is still running, reset the nodes to maintenance mode. You can skip this if you booted off of install media.
task talos:reset
Then wait for machinestatus to be maintenance on all nodes:
talosctl get machinestatus -n <IP> --insecure
Then bootstrap Talos:
task bootstrap:talos
Wait for machinestatus to be running on all nodes:
talosctl get machinestatus -w (because we have talconfig now, we can skip the IP and insecure flags)
Then bootstrap the apps with helmfile and watch the operators (flux and volsync) take over to bring everything back up:
task bootstrap:apps
The only app that may need some manual intervention after that is Tailscale, if the token has expired.