This repository contains all of the configuration and documentation of my Kubernetes homelab.
The purpose of my homelab is to learn production patterns at scale and to have fun experimenting with cloud-native technologies. As someone working with Kubernetes professionally, my homelab is where I can break things, learn from mistakes, and understand infrastructure deeply - without the pressure of production incidents.
Self-hosting applications forces me to think about the entire lifecycle: backup strategies, disaster recovery, security hardening, GitOps workflows, and operational excellence. It's one thing to deploy an app - it's another to maintain it reliably and recover from failures.
- Everything is code - Infrastructure, applications, and configurations
- Zero secrets in Git - All sensitive data in Azure Key Vault
- Immutable infrastructure - Talos Linux enforces declarative configuration
- Full automation - FluxCD handles deployments from Git
- 10-minute recovery - Entire cluster can be rebuilt from this repo
I use Talos Linux to provision my Kubernetes nodes. Talos is minimal, immutable, and API-driven (no SSH access). It provides production-grade security out of the box and eliminates configuration drift entirely.
I run a single 6-node cluster with high availability:
| Cluster | Description |
|---|---|
| lothbrok | Production cluster. 3 control plane nodes (HA with kube-vip) + 3 worker nodes. Runs all infrastructure components and applications. Fully provisioned from code via FluxCD GitOps. Can be destroyed and rebuilt in 10 minutes. |
Control Plane:
- 3 nodes for high availability
- kube-vip provides virtual IP for API server
- etcd cluster tolerates single node failure
- Any control plane node can fail - cluster stays operational
Workers:
- 3 nodes for application workloads
- Dedicated to running pods
- Resource isolation from control plane
Network:
- Flannel CNI (via Talos)
- MetalLB for LoadBalancer services (L2 mode)
- 4 segmented IP pools for different workload types
3-Phase Deployment Pipeline:
Phase 1: infrastructure ↓ (deploys controllers)
Phase 2: infrastructure-secrets ↓ (syncs from Azure Key Vault)
Phase 3: infrastructure-config ↓ (applies configs with substitution)
Result: Fully automated, zero manual intervention
Why 3 phases?
- Prevents race conditions (configs needing secrets that don't exist yet)
- Proper dependency ordering (operators before CRDs)
- Clean separation of concerns (controllers vs. secrets vs. configs)
Running on Proxmox VE 8.x as the hypervisor layer:
| Node | Role | vCPU | RAM | Disk | Purpose |
|---|---|---|---|---|---|
| talos-cp-01 | Control Plane | 4 | 6 GB | 32 GB | etcd, kube-apiserver, scheduler, controller-manager |
| talos-cp-02 | Control Plane | 4 | 6 GB | 32 GB | etcd, kube-apiserver, scheduler, controller-manager |
| talos-cp-03 | Control Plane | 4 | 6 GB | 32 GB | etcd, kube-apiserver, scheduler, controller-manager |
| talos-worker-01 | Worker | 4 | 8 GB | 100 GB | Application workloads |
| talos-worker-02 | Worker | 4 | 8 GB | 100 GB | Application workloads |
| talos-worker-03 | Worker | 4 | 8 GB | 100 GB | Application workloads |
Physical Hardware:
- Proxmox Host: ThinkCentre M90T, 48GB RAM & GPU 1660 Super
- Storage: local storage
| Logo | Name | Description |
|---|---|---|
| Linkding | Self-hosted bookmark manager with tagging and full-text search | |
| pgAdmin | Web-based PostgreSQL database management tool |
Everything needed to run the cluster and deploy applications:
| Logo | Name | Description |
|---|---|---|
| Talos Linux | Immutable, API-driven Kubernetes OS. No SSH, minimal attack surface, production-grade security | |
| FluxCD | GitOps operator. Watches Git, applies changes automatically. 3-phase pipeline for dependency management | |
| External Secrets Operator | Syncs secrets from Azure Key Vault to Kubernetes. Zero secrets in Git. ~2,880 API calls/month = $0.01 | |
| cert-manager | Automated certificate management. Let's Encrypt integration via HTTP-01 and DNS-01 challenge. 30-day auto-renewal | |
| Renovate | Automated dependency updates. Tracks 9 Helm releases + 15+ container images. Weekly scans with GitOps validation | |
| MetalLB | LoadBalancer implementation for bare-metal. L2 mode. 4 IP pools: ingress, services, database, reserved | |
| Traefik | Cloud-native ingress controller. Automatic routing based on Ingress resources. Gets IP from MetalLB | |
| Cloudflare Tunnel | Zero-trust access without port forwarding. Secure tunnel from cluster to Cloudflare edge | |
| SOPS | Encrypted secrets in Git. Used for non-dynamic sensitive configs |
| Logo | Name | Description |
|---|---|---|
| Prometheus | Metrics collection and alerting | |
| Grafana | Dashboards and visualization |
| Logo | Name | Description |
|---|---|---|
| CloudNativePG | PostgreSQL operator for Kubernetes. Production-grade database management |