Cluster Attempt #2


This project concluded months ago, and this is more a postmortem than a formal writeup. Future projects will be written up more properly.

For my second cluster, I tried something much more sane: 4 ThinkCentre M715q Tiny mini PCs. Having four functional, decently-specced computers helps immensely, and it worked out of the box aside from Ryzen C-state freezes with Linux. I also added my media server and my workstation to the cluster to provide more types of hardware I would have to learn to orchestrate for.

There’s significantly less to say about this project compared to the ongoing headache of my first cluster attempt. I handled bootstrapping via Kubespray, got in the weeds learning to use kubectl, writing manifests, breaking and un-breaking services, Helm templating, Rancher, Longhorn (Longhorn is VERY cool btw), the whole nine yards.

Kubernetes has a wonderful ecosystem and I truly do miss using it, but I will not be using it to run my homelab again unless the hardware makeup of my lab changes significantly. If you have experience with Kubernetes, you can probably intuit the reasons why I transitioned away from it, but for the sake of documentation I will elaborate:

  • High-availability is a liability in a small, unstable environment. In theory, the goal of high-availability is that if one host goes down, the others can compensate for it until it’s back. However, this is only true for stable environments. Having a fallback control plane is only useful until someone plugs in a hair dryer, tripping the breaker and knocking the entire cluster out in one fell swoop. Self-healing is cool until you realize that your CNI is misconfigured in a specific way where the cluster tries to re-establish connections so aggressively that they keep downing each other’s interfaces (behavior I did not even know was possible unti I experienced it firsthand).
  • CNIs are cool for an actual production setup, but not terribly useful at home. There’s so many cool bells and whistles, if you love having levers to pull and buttons to press you will be satisfied forever. However, the amount of setup you need to get a stable setup on anything other than a test cluster you’re running at home is incredibly high. That being said, as a piece of advice: if you’re running Kubernetes at home, spring for Cilium over Calico. From my experience, Calico is easier to set up, but Cilium makes you touch every aspect of your networking from the jump but has a much easier interface to learn.
  • Kubernetes has too much overhead. You need multiple computers (or VMs) to get standard k8s working, you have to deploy controllers, you have to deploy routers, and if the end goal of all of that is a password manager and a media server, the entire exercise will feel like using a chainsaw to cut a piece of paper.

That all being said, it’s a great learning experience. By daily driving your services on k8s, you become forced to engage with its intricacies adn debug things for yourself. It doesn’t replace the need for learning Kubernetes concepts from proper courses, but it gives you practical experience those courses hardly do.

Also (not that it matters), using it feels cool as hell. Fixing services is usually a dance of editing yaml and spamming kubectl commands. I love Linux for numerous reasons, but something about typing out commands at rapid speed feels incredibly good to do. If Linux is the operating system for people who love to type, Kubernetes is the orchestration software for those same peoople.

I do not regret the experience at all even after nuking my cluster, as it was a very valuable learning experience. Not only do you learn a lot about Kubernetes, but you have to learn a lot about networking generally just to keep your cluster operating. It is a headache, you will proably not run it forever (unless you just don’t have a larger computer to run Docker on, for some reason), but it is entirely worth the trials and tribulations.