You were supposed to bring balance to Kubernetes, Rancher, not destroy it
et tu? Rancher?
I’ve been maintaining my own dedicated servers for around 7 years now as a way to learn and improve skills and have a place to run my different web sites, mail servers, even this blog. Over the years the hardware has changed and I’ve moved from hosting Rails applications directly on the OS to Docker and finally Kubernetes. I’ve learned a lot of skills that eventually helped me in my professional career at my job that it’s definitely been worth it, but maintaining this server has had its massive pain points where I’ve just had to walk away and leave stuff broken for days until I finally fix the issues.
I selected Rancher several years ago (at least 3 or 4 years ago I’d estimate) when I finally moved to Kubernetes. I liked how it automatically provisioned my clusters, managed networking, and provided a nice UI. It was also reasonably recommended by the internet. Things worked reasonably well, but after adopting Rancher and Kubernetes, every 6-12 months I’d end up having something massively break and I’d have to rebuild the entire Kubernetes cluster painstakingly and many times I’d tell myself if it broke, then I’d just swear off Rancher entirely, but it never happened because I eventually got everything working.
After upgrading to Rancher v2.6.3 that just launched yesterday and finding that all my clusters were removed from Rancher, I hit my breaking point.
After I’ve had time to run my home lab for a while, I’ve started switching to a more up to date Linux distribution (instead of RancherOS.) I’m currently testing Ubuntu Server which leverages Systemd. Systemd-networkd is responsible for managing the network interface configuration and it differs in behavior compared to NetworkManager enough that we need to update the Home Lab Bridge CNI to handle it.
Previously the CNI was creating a bridge network adapter when the first container started up, but this causes problems with systemd because resolved (the DNS resolver component) was eventually failing to make DNS queries and networkd was duplicating IP addresses on both eth0 (the actual uplink adapter) and on cni0 because we were copying it over.
Long ago, I installed Longhorn onto my Kubernetes cluster using Helm 2. Then eventually Helm 3 was released and helm 2to3 was made available. However, I was not able to use helm 2to3 for whatever reason because Rancher didn’t deploy Tiller in the way that this CLI expected. Additionally, Rancher did not provide an upgrade mechanism to handle this. Eventually Rancher 2.6 was released which entirely dropped Helm 2 support and I was stuck with a cluster where Longhorn was deployed, but not managed by a working Helm installation.
This blog post outlines how you can recover Longhorn and upgrade it to Helm 3 without deleting all your volumes. This guide isn’t specific to Rancher 2.6.
Kubernetes automatically exposes certain services as a port on your host and may unintentionally exposing private services. Here’s how to fix that
I recently responded to the Log4j vulnerability. If you’re not aware, Log4j is a very popular Java logging library used in many Java applications. There was a vulnerability where malicious actors could remotely take control of your computer by submitting a specially crafted request parameter that gets directly logged to log4j.
This situation was not ideal since I was running several Java applications on my servers, thus I decided to use Nmap to port scan my dedicated server to see what ports were open. I ended up finding a number of ports I didn’t expect because several of Kubernetes Service instances were being mapped as node ports.
In this post, I outline the problem with Kubernete’s default strategy for services and how to avoid exposing ports that you don’t need.
Over the past year I helped a few people pick mortgages while buying their homes by helping them visualize different mortgage options from different companies. In a seller’s market, like where I live, you only get a few days to pick from a number of different mortgages that all offer different fees, points, and interest rates that all influence the monthly rate that you pay.
Given all this data, how do you compare the difference options and decide which one to go with? The lowest monthly rate isn’t always the best option.
In previous posts, I leveraged the MACvlan CNI to provide the networking to forward packets between containers and the rest of my network, however I ran into several issues rooted from the fact that MACvlan traffic bypasses several parts of the host’s IP stack including conntrack and IPTables. This conflicted with how Kubernetes expects to handle routing and meant we had to bypass and modify IPTables chains to get it to work.
While I got it to work, there was simply too much wire bending involved and I wanted to investigate alternatives to see if anything was able to fit my requirements better. Let’s consider the bridge CNI.
In the previous post (DHCP IPAM), we successfully got our containers running with macvlan + DHCP. I additionally installed MetalLB and everything seemingly worked, however when I tried to retroactively add this to my existing Kubernetes home lab cluster already running Calico, I was not able to access the Metallb service. All connections were timing out.
A quick Wireshark packet capture of the situation exposed this problem:
The SYN packet from my computer made it to the container (LB IP 1918.104.22.168), but the responding SYN/ACK packet that came back had a source address of 192.168.2.76 (the pod’s network interface.) This wouldn’t work because my computer ignored it because it didn’t belong to an active flow.
In the previous post, we end up abusing subnets and routing to get Calico to exist on the correct subnet, but what if we could get rid of Calico’s duplicate IPAM system and just depend on our existing DHCP server to handle reservations? In this post, we’re going to prototype a cluster that uses DHCP + layer 2 Linux bridging to avoid the complications outlined in Part 3.
This avoids overlapping IPAM problems with the previous solution and means that the DHCP server already running on my network would be responsible for handing out IP addresses directly to the containers.
In my previous post series, I described how I installed my Kubernetes Home Lab using Calico and MetalLB. This worked great up until I started installing smart home software that expected to be able to do local network discovery. For example, Home Assistant and my Sonos control software both attempted to do subnet local discovery using mDNS or broadcast packets. This did not work because the pods were running on a 192.168.4.0/24 subnet, but all of my physical devices were on 192.168.2.0/24.
This prevented Home Assistant from discovering any devices and had to be fixed.
Next up in the series, we’re going to manually configure all of the network settings to get our flat network home lab. Our flat network should not use any packet encapsulation with all pods and services fully routable to and from the existing network.
Detailed in the previous post, I want a so-called flat network because packet encapsulation tunnels IP packets inside of other IP packets and creates a separate IP network that runs on-top of my existing network.) I wanted all nodes, pods, and services to be fully routable on my home network. Additionally, I had several Sonos speakers and other smart-home devices that I wanted to be control from my k8s cluster which required pods that ran on the same subnet as my other software.
Install CNI Plugin
The CNI (Container Network Interface) plugin is responsible configuring the network adapter that each Kubernetes pod has. Since each pod usually gets a separate network namespace isolated from the host’s main network adapter, without it, no pod could make any network calls. For more information, check out cni.dev or the K8s documentation.