← All projects

Homelab

A 6-node Kubernetes cluster across Proxmox VMs and bare-metal Intel NUCs, deployed with Kubespray and managed end-to-end by Flux.

·
kubernetesproxmoxkubesprayfluxhomelab

A self-hosted platform running in my basement. The goal was to run things the same way I would at work — GitOps, OIDC everywhere, secrets that never touch git — but on hardware I own and look after alone.

Topology

Three control-plane nodes virtualised on a Proxmox host, three bare-metal Intel NUC workers, all on a dedicated lab VLAN behind an OPNsense router.

ComponentVersion / config
Kubernetesv1.34.3, deployed via Kubespray v2.30.0 (kubeadm under the hood)
Container runtimecontainerd 2.2.1
CNICalico (VXLAN, MTU 1450)
kube-proxyIPVS mode (strict_arp: true)
Load balancingMetalLB in L2 mode
IngressEnvoy Gateway via the Gateway API
Certificatescert-manager + Cloudflare DNS-01
OSUbuntu 24.04 across all nodes

GitOps with Flux

Every cluster-side change is driven from a Git repo. Flux reconciles four layered Kustomizations with explicit dependsOn ordering — the apply sequence is fixed, so I don't have to think about it:

Repo layout:

clusters/homelab/         # Flux entry point — Kustomization CRs
infrastructure/
  sources/                # HelmRepository definitions
  controllers/            # HelmReleases for the cluster's runtime pieces
  configs/                # post-install config (Gateway, IPAddressPool, RBAC, Kyverno policies)
apps/                     # application workloads
.sops.yaml                # encryption rules for secrets

Flux also runs image update automation — it watches ImagePolicy resources across the cluster and commits updated image tags back to the repo automatically.

Secrets — Vault + VSO, SOPS for the rest

Workload secrets are sourced from HashiCorp Vault running on the cluster. The current pattern uses the Vault Secrets Operator — each namespace that needs secrets gets a VaultAuth and one or more VaultStaticSecret CRs, which VSO reconciles into native Kubernetes Secret objects on a continuous sync loop. I wrote about migrating to this from a CronJob-based pattern if you want the full before/after.

Vault itself runs on the cluster, which is the obvious trade-off: if Vault is down, workloads that depend on secret sync can't start. At this scale it's an acceptable risk — the cluster is stable and Vault's HA mode would add complexity I don't need yet.

For secrets that VSO can't reach — Flux itself, HelmRelease values, anything without a namespace to target — secrets are encrypted in git with SOPS and age. The cluster holds the age private key as a sops-age Secret in flux-system, and Flux's kustomize-controller decrypts on the fly during reconciliation.

Authentication — OIDC via Authentik

The API server accepts client certificates (the default break-glass kubeconfig) and OIDC tokens via Authentik. kubectl uses kubelogin as an exec credential plugin to drive the authorization-code flow in a browser; group claims map to RBAC ClusterRoles via two simple bindings:

Authentik roleKubernetes groupRBAC
app-kubernetes-adminoidc:admincluster-admin
app-kubernetes-useroidc:userview

The OIDC flags are configured via Kubespray group vars, so the configuration moves with the cluster.

Provisioning chain

The cluster doesn't bootstrap itself. A separate provisioning repo handles everything below the apiserver:

ToolPurpose
PackerBuild the Ubuntu 24.04 Proxmox VM template
TerraformCreate Proxmox VMs and manage Vault, Authentik, Harbor, GitHub, Okta resources
KubesprayBootstrap and upgrade the Kubernetes control plane and workers
FluxTake over once the cluster is up — everything else is GitOps

There's also Kyverno running as an admission controller with a handful of cluster-wide policies: no latest tags, no privilege escalation, required labels, required resource requests. Not exhaustive, but enough to catch the lazy mistakes.

Why not cloud?

Partly cost — I didn't want a recurring cloud bill for something I run at home. But mostly I wanted something physical I could actually work on, upgrade over time, and break without consequences. There's a difference between reading about NIC offloading issues and having to fix one because your node just fell off the network.

Notable gotchas

Wildcard DNS + ndots: 5. A *.sperring.io record pointing at the reverse proxy combined with Kubernetes' default ndots: 5 and the node's search domain caused pods to resolve external hostnames through the wildcard, returning the wrong IP for anything with fewer than five dots. Fixed by pointing kubelet at a clean /etc/kubernetes/resolv.conf with no search domains.

Proxmox e1000e NIC hangs. The onboard Intel I219 NIC on the Proxmox host hangs under load and takes the whole node off the network with it. Mitigated by disabling TX/RX offloading (ethtool -K nic0 tso off gso off gro off), made persistent via the pve-base Ansible role.

Stack

Kubernetes, Flux, Calico, MetalLB, cert-manager, Vault, VSO, Authentik, Kyverno, Prometheus, Grafana, Loki.