I recently setup a Kubernetes cluster home lab and wanted to do it the hard-way and share it with you. I setup a home lab so I could run my smart home software and learn more about different Kubernetes networking technologies.
This blog post is broken up into several sections. Feel free to skip directly to the section that applies to you.
When I started I had a few things already:
I was already using Rancher as a UI to manage my Kubernetes clusters on my dedicated servers
I wanted a fully flat network, that means no packet encapsulation. Packet encapsulation tunnels IP packets inside of other IP packets and creates a separate IP network that runs on-top of my existing network.) I wanted all nodes, pods, and services to be fully routable on my home network. Additionally, I had several Sonos speakers and other smart-home devices that I wanted to be control from my k8s cluster which required pods that ran on the same IP network.
Docker Desktop and WSL2 are both great for development Docker projects where you use the Docker CLI, but when you try to run Kubernetes you’ll quickly run into networking issues. WSL2 and Docker Desktop can’t expose services to the rest of your network very easily because they use NAT’d network adapters. (GitHub microsoft/WSL#4150) This means you can’t expose nodes or pods as devices on the network, they will always be NAT’d to the host’s IP address. This failed my requirement.
Next up in the series, we’re going to manually configure all of the network settings to get our flat network home lab. Our flat network should not use any packet encapsulation with all pods and services fully routable to and from the existing network.
Detailed in the previous post, I want a so-called flat network because packet encapsulation tunnels IP packets inside of other IP packets and creates a separate IP network that runs on-top of my existing network.) I wanted all nodes, pods, and services to be fully routable on my home network. Additionally, I had several Sonos speakers and other smart-home devices that I wanted to be control from my k8s cluster which required pods that ran on the same subnet as my other software.
Install CNI Plugin
The CNI (Container Network Interface) plugin is responsible configuring the network adapter that each Kubernetes pod has. Since each pod usually gets a separate network namespace isolated from the host’s main network adapter, without it, no pod could make any network calls. For more information, check out cni.dev or the K8s documentation.
In my previous post series, I described how I installed my Kubernetes Home Lab using Calico and MetalLB. This worked great up until I started installing smart home software that expected to be able to do local network discovery. For example, Home Assistant and my Sonos control software both attempted to do subnet local discovery using mDNS or broadcast packets. This did not work because the pods were running on a 192.168.4.0/24 subnet, but all of my physical devices were on 192.168.2.0/24.
This prevented Home Assistant from discovering any devices and had to be fixed.
In the previous post, we end up abusing subnets and routing to get Calico to exist on the correct subnet, but what if we could get rid of Calico’s duplicate IPAM system and just depend on our existing DHCP server to handle reservations? In this post, we’re going to prototype a cluster that uses DHCP + layer 2 Linux bridging to avoid the complications outlined in Part 3.
This avoids overlapping IPAM problems with the previous solution and means that the DHCP server already running on my network would be responsible for handing out IP addresses directly to the containers.
In the previous post (DHCP IPAM), we successfully got our containers running with macvlan + DHCP. I additionally installed MetalLB and everything seemingly worked, however when I tried to retroactively add this to my existing Kubernetes home lab cluster already running Calico, I was not able to access the Metallb service. All connections were timing out.
A quick Wireshark packet capture of the situation exposed this problem:
The SYN packet from my computer made it to the container (LB IP 19126.96.36.199), but the responding SYN/ACK packet that came back had a source address of 192.168.2.76 (the pod’s network interface.) This wouldn’t work because my computer ignored it because it didn’t belong to an active flow.
In previous posts, I leveraged the MACvlan CNI to provide the networking to forward packets between containers and the rest of my network, however I ran into several issues rooted from the fact that MACvlan traffic bypasses several parts of the host’s IP stack including conntrack and IPTables. This conflicted with how Kubernetes expects to handle routing and meant we had to bypass and modify IPTables chains to get it to work.
While I got it to work, there was simply too much wire bending involved and I wanted to investigate alternatives to see if anything was able to fit my requirements better. Let’s consider the bridge CNI.
After I’ve had time to run my home lab for a while, I’ve started switching to a more up to date Linux distribution (instead of RancherOS.) I’m currently testing Ubuntu Server which leverages Systemd. Systemd-networkd is responsible for managing the network interface configuration and it differs in behavior compared to NetworkManager enough that we need to update the Home Lab Bridge CNI to handle it.
Previously the CNI was creating a bridge network adapter when the first container started up, but this causes problems with systemd because resolved (the DNS resolver component) was eventually failing to make DNS queries and networkd was duplicating IP addresses on both eth0 (the actual uplink adapter) and on cni0 because we were copying it over.
Previously in my Home Lab series, I described how my home lab Kubernetes clusters runs with a DHCP CNI–all pods get an IP address on the same layer 2 network as the rest of my home and an IP from DHCP. This enabled me to run certain software that needed this like Home Assistant which wanted to be able to do mDNS and send broadcast packets to discover device.
However, not all pods actually needed to be on the same layer 2 network and lead to a few situations where I ran out of IP addresses on the DHCP server and couldn’t connect any new devices until reservations expired:
I also had a circular dependency where the main VLAN told clients to use a DNS server that was running in Kubernetes. If I had to reboot the cluster, my Kubernetes cluster could get stuck starting because it tried to query a DNS server that wasn’t started yet (For simplicity, I use DHCP for everything instead of static config).
In this post, I explain how I built a new home lab cluster with K3s and used Multus to run both Calico and my custom Bridge+DHCP CNI so that only pods that need layer 2 access get access.