Switching to Kubernetes
And you may ask yourself, "How do I work this?"
And you may ask yourself, "Where is that large home server?"
Once upon a time I had a Mac mini. It was hooked up to the tv (because we only had the one) and it ran Plex. It was fine.
Later, my new spouse and I moved across the country into a house. I decided that I should get a server because I was going to be a big time consultant and I figured I would need a staging environment. A Dell T30 picked up on super sale arrived soon after.
The server sat, ignored, while we suffered through the first few years of one baby, then two babies.
Later, we moved to our forever house and I found Home Assistant. I picked up a Raspberry Pi 4. All was good.
Except it kind of sucked? A 1GB Pi 4 is pretty limited in what it can practically run. Home Assistant ran mostly ok but anything else was beyond it's capabilities. To eBay!
Oooh, shiny hardware
Over the past four years I've accumulated a modest menagerie of hardware:
- Hypnotoad, a HP 800 G3 mini
- Crushinator, a HP 800 G3 SFF
- Morbo, another HP 800 G3 SFF
- Lrrr, a Dell Wyse 5070 thin client
- Roberto, another Dell Wyse 5070
- Nibbler, a Lenovo M80S Gen 3 SFF
- Shed, another Dell Wyse 5070 (such a boring name)
- A pack of roving Dell Wyse 3040 thin clients
- The original Pi 4
The T30, sadly, imploded when I tried to install a video card and fried the motherboard. It's name was Kodos and it was a good box.
Software, take 1 through N
As I was acquiring hardware I was also acquiring software to run on it and developed a somewhat esoteric way of deploying that software. The first interesting version was a self-deploying Docker container. It would get passed the Docker socket and run compose, deciding on the fly what to deploy based on the hostname of the machine.
This was fine, but it proved too much for the 3040s which have fragile 8GB eMMC drives.
A later version moved the script to my laptop and used Ansible to push Docker compose files out to all the machines.
Fine. Fiddly, but fine.
Software, take N + 1
Xe Iaso is a person that I've been following online for years. Recently they went through a homelab transformation, where for Reasons they decided to switch away from NixOS. After trying various things, much to everyone's chagrin, they settled on Kubernetes running on Talos.
Talos seemed to be what I wanted: an immutable, hardened OS designed for one thing and one thing only: Kubernetes.
Much like Xe, I had resisted Kubernetes at home for a long time. Too complex. Too much overhead. Just too much.
Taking another look at that hardware list, though, I do actually have a somewhat Kubernetes-shaped problem. I want to treat my hardware as respected but mostly interchangable pets.
My deployment script was sophisticated but had no ability to just put something somewhere else automatically. It was entirely static, so when something needed to move I would have to restore a backup to the new target and manually redeploy at least part of the world in order to get the ingress set up properly.
Kubernetes takes care of that stuff for me. I don't have to think about where any random workload runs and I don't have to think about migrating it somewhere else if the node falls over. DNS, SSL certificates, backups, it all just happens in the background.
What's it look like?
After a couple of weeks of futzing around and day dreaming I settled on this software stack:
- Kubernetes (obvo)
- Talos Linux driven by Talhelper
- Helm for off the shelf components, driven by Helmfile
- Longhorn for fast replicated storage
- 1Password Operator for secrets management
- Tailscale Operator for private ingress and a subnet router to poke at services and pods directly
- ingress-nginx for internal and external access to services
- MetalLB to give local services a stable virtual IP address
- cert-manager for automatic LetsEncrypt certificates for services
- external-dns to drive DNS updates for services
- Keel.sh for automatic image updates
- Reloader to reload deployments when linked secrets and configs update
- EMQX as the MQTT server for some of my IoT devices (mostly zigbee)
- CloudNative PG for PostgreSQL databases
The next thing to decide was how to divide up the hardware into control plane and worker nodes. Here's what I have so far:
- Three (3) control plane nodes: hypnotoad, crushinator, and lrrr
- Seven (7) local worker nodes: hypnotoad, crushinator, nibbler, shed, three Wyse 3040s hosting Z-Wave sticks
- One (1) cloud worker node
You might notice that several nodes are doing double duty.
Splitting the control plane off to dedicated nodes makes sense when you have a fleet of hundreds of machines in a data center. I don't have that.
A small VM running on Lrrr is the only dedicated control plane node. The only reason for that is because Lrrr also hosts my Unifi and Omada network controllers and I haven't worked up the gumption to move those from Proxmox LXCs to k8s workloads.
Hypnotoad, Crushinator, and Nibbler are general compute. Nibbler has an Nvidia Tesla P4 GPU, which is not particularly impressive but fun to play with. Both Hypnotoad and Nibbler have iGPUs capable of running many simultaneous Jellyfin streams. Crushinator is a VM taking up most of the host which is also serving as a backup NAS for non-media data.
Shed lives in the shed and is connected to a bunch of USB devices, including two SDR radios, a Z-Wave stick, and an RS232-to-USB adapter for the generator.
Morbo is running TrueNAS and has no connection to Kubernetes except that some stuff running in k8s uses NFS shares. It's also the backup target for Longhorn and the script I use to backup Talos' etcd database.
Self-hosting in the Cloud
Talos has a neat feature built in that they call KubeSpan. This is a Wireguard mesh network between all of the nodes in the cluster that uses a hosted discovery service to exchange public keys.
Essentially, you can flip a single option in your Talos configs and have all of your nodes meshed, with a bonus option to send all internal cluster traffic over the Wireguard interface. The discovery service never sees private data, just hashes. It's really cool.
I'm using KubeSpan to put one of my nodes on a VPS to get a public IP without exposing my home ISP connection directly. After initial setup I was able to change the firewall to block all inbound ports other than 80, 443, and the KubeSpan UDP port.
To actually serve public traffic I installed a separate instance of ingress-nginx that only runs on the cloud node. This instance is set up to directly expose the cloud node's public IP which gets picked up by external-dns automatically.
I'm still trying to decide if this single node is enough or if I should get really clever and use a proxy running on Fly to get a public anycast IP.
Ok, but what's running?
Learning how Kubernetes works has been great and this process filled in quite a few gaps in my understanding, but it probably wouldn't have been worth the effort without hosting something useful.
Currently I'm hosting a couple of external production workloads:
- this blog
- VMSave
- a handful of very small websites
I'm also running a bunch of homeprod services:
- Home Assistant
- Whisper and Piper, speech-to-text and text-to-speech tools and components of the Home Assistant voice pipeline
- four (4) instances of Z-Wave JS UI, one per RF "zone" (this house has wacky RF behavior)
- two (2) instances of Zigbee2MQTT, one in each RF zone that has Zigbee devices
- Genmon keeps tabs on our whole home standby generator
- A Minecraft server for me and my kids
- Paperless-ngx stores and indexes important documents
- Ultrafeeder puts the planes flying overhead on a map
- iCloudPD-web for effortless iCloud photo backups
- Jellyfin, an indexer and server for TV shows, movies and music
- Sonarr, Radarr, Prowlarr and SABnzbd form the core of our media acquisition system
- Jellyseerr makes requesting new media easy for the other people who live in the house
- Calibre Web Automated is an amazing tool that serves all of my eBooks to my Kobo eReader
- Ollama and Open Web UI for dinking around with local LLMs
- Homer to keep track of all of the above, set as my browser homepage
Left To Do
There are a few things left on the todo list. Roberto is hooked up to a webcam that watches my 3D printer and I haven't touched it yet because it is connected via Wi-Fi which Talos doesn't support at all.
I also haven't touched the raspberry pi, mostly for the same reason. The pi is serving as a gateway between a Wi-Fi SD card that lives in my CPAP machine and the rest of the network so that I can scrape the data off without having to pull the card or futz with my laptop's Wi-Fi every day.
The Wi-Fi SD card, you see, only exists as an access point. It cannot be put into a mode that connects to another network. The pi has a USB Wi-Fi adapter connected to the card's network and the built-in Wi-Fi connected to the home network with nginx in between serving as a proxy. I don't think this is something that I really want or need to move into k8s.
I want to set up some sort of central authn/authz system for the homeprod services. The current fashion seems to be Pocket ID but I haven't been able to get it working reliably.
I'm thinking about setting up a small ActivityPub server to play around with.
A photo viewer like Immich might be cool to set up.
Overkill?
Of course this is overkill. This could probably all live on a single Wyse 5070 with a couple big harddrives attached.
I think it's been worth it to use Kubernetes in anger. I'm really enjoying the ability to deploy whatever I want to the cluster without having to think about where it runs, where it stores data, etc.
I've also learned a ton and fixed a bunch of preconceived notions and it's already helped increase reliability in a few things that affect household acceptance in big ways.