My external cluster runs on 3 different dedicated servers (most from SoYouStart.com.) I have 3 machines since the Kubernetes control plane needs 3 or more to be able to have a quorum and be able to handle any one machine going down. If one machine goes down, then the other two maintain a majority and can agree on the state of the cluster.
I randomly encountered issues where the Kubernetes control plane of Rancher UI would crash and restart. While this cluster didn’t really matter, it still annoyed me and wanted to figure it out.
I narrowed it down to one single host and documented the steps I took to resolve this issue which seems to be have been caused by one machine using HDDs and all other hosts using SSDs.
Continue reading “From a random Kubernetes control plane crash to a new RAID array”