In my previous post where I outlined challenges that I’ve encountered with Rancher. As part of the feedback to that I ended up having to rebuild one of my clusters. I took that time to try out RKE2 and K3s for my home lab. In this home lab, I use a custom CNI based on the official Bridge and DHCP IPAM CNIs (Read more) to enable my smart home software (HomeAssistant) to communicate with other devices on the same Layer 2 domain.
However, it seems that if you try to spin up a RKE2 cluster on a host with a Bridge interface setup (See here) then it’ll get stuck during provisioning and you won’t be able to download a Kube Config from Rancher Server because Rancher thinks it’s offline. I reported this issue initially here.
In this blog post, I explain more about the problem and how to directly connect to the cluster to install a working CNI such that Rancher will correctly start.
Problem Continued
In this cluster, I setup a single Ubuntu Server node that has a bridge interface configured exactly as I’ve done before (See here). I’ve configured with cni:multus,calico
ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master cni0 state UP group default qlen 1000 link/ether 00:15:5d:02:cb:08 brd ff:ff:ff:ff:ff:ff 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:06:25:9b:31 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0 valid_lft forever preferred_lft forever 4: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:15:5d:02:cb:08 brd ff:ff:ff:ff:ff:ff inet 192.168.2.241/24 brd 192.168.2.255 scope global dynamic cni0 inet6 fe80::215:5dff:fe02:cb08/64 scope link valid_lft forever preferred_lft forever ip route default via 192.168.2.1 dev cni0 proto dhcp src 192.168.2.241 metric 1024 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 192.168.2.0/24 dev cni0 proto kernel scope link src 192.168.2.241 192.168.2.1 dev cni0 proto dhcp scope link src 192.168.2.241 metric 1024
There’s a valid route outwards however Calico can’t start because it reports:
sudo crictl --runtime-endpoint=unix:///run/k3s/containerd/containerd.sock ps -a CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID ac2bab78f970e c59896fc7ca44 3 minutes ago Exited calico-node 8 c9c4aa34f68a9 sudo crictl --runtime-endpoint=unix:///run/k3s/containerd/containerd.sock logs ac2bab78f970e ... 2022-04-10 18:12:09.146 [WARNING][10] startup/startup.go 710: Unable to auto-detect an IPv4 address: no valid IPv4 addresses found on the host interfaces 2022-04-10 18:12:09.146 [WARNING][10] startup/startup.go 477: Couldn't autodetect an IPv4 address. If auto-detecting, choose a different autodetection method. Otherwise provide an explicit address. 2022-04-10 18:12:09.146 [INFO][10] startup/startup.go 361: Clearing out-of-date IPv4 address from this node IP="" 2022-04-10 18:12:09.150 [WARNING][10] startup/utils.go 48: Terminating Calico node failed to start
Rancher Server shows the following logs.
[INFO ] waiting for at least one bootstrap node [INFO ] provisioning bootstrap node(s) custom-3bfcd9ce3995: waiting for agent to check in and apply initial plan [INFO ] provisioning bootstrap node(s) custom-3bfcd9ce3995: waiting on probes: etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet [INFO ] provisioning bootstrap node(s) custom-3bfcd9ce3995: waiting on probes: etcd, kube-apiserver, kube-controller-manager, kube-scheduler [INFO ] provisioning bootstrap node(s) custom-3bfcd9ce3995: waiting on probes: kube-apiserver, kube-controller-manager, kube-scheduler [INFO ] non-ready bootstrap machine(s) custom-3bfcd9ce3995: waiting for cluster agent to be available and join url to be available on bootstrap node
The cluster will never progress because Rancher needs to launch the cattle-cluster-agent, but this needs a working CNI to launch correctly. However, we can’t fix the CNI because Rancher won’t give a Kube Config that allows us to connect to the cluster and deploy the working CNI.
RKE2 – Get a valid credential
Since I have full access to the host running the RKE2 cluster, I should be able to gain access to it somehow. Each Kubernetes pod deployed to a host gets a special volume mounted inside the container that it can use to communicate to the Kubernetes apiserver. By default, these pods don’t generally have any privileges, but if we can find one that has enough privileges to create the resources we need, we can get the cluster working.
In this cluster, I enabled the kubernetes API endpoint in Rancher. This deployed a container called kube-api-auth. Luckily this container grants all the privileges we need.
sudo ctr --address /run/k3s/containerd/containerd.sock --namespace k8s.io c ls | grep kube-api-auth 57d148cadbdceb998ab7be8e38f72dec1fa0fe8c6f313dcab19e09ba9245eb1f docker.io/rancher/kube-api-auth:v0.1.6 io.containerd.runc.v2
There may be two containers displayed. One of them is the pause container which serves as a special init process. If you want to know why here’s a good blog post.
Inspect the container and look for the volume mount kube-api-access:
sudo ctr --address /run/k3s/containerd/containerd.sock --namespace k8s.io c info 57d148cadbdceb998ab7be8e38f72dec1fa0fe8c6f313dcab19e09ba9245eb1f | grep kube-api-access "source": "/var/lib/kubelet/pods/972647f6-e514-40c6-a0a0-6891898a2dec/volumes/kubernetes.io~projected/kube-api-access-vj98z"
sudo ls /var/lib/kubelet/pods/972647f6-e514-40c6-a0a0-6891898a2dec/volumes/kubernetes.io~projected/kube-api-access-vj98z ca.crt namespace token
The JWT token can be extracted from the ‘token’ file. With this we’re going to effectively impersonate this container and the use the privileges that it has:
sudo cat /var/lib/kubelet/pods/972647f6-e514-40c6-a0a0-6891898a2dec/volumes/kubernetes.io~projected/kube-api-access-vj98z/token eyJhbGciOi...My
Kubectl needs the CA certificate to validate the SSL certificate:
sudo cat "/var/lib/kubelet/pods/22d4bf53-2f87-4d58-9272-9c4d0bad47f2/volumes/kubernetes.io~projected/kube-api-access-tl4qz/ca.crt" | base64 -w 0 ; echo LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJlVENDQVIrZ0F3SUJBZ0lCQURBS0JnZ3Foa2pPUFFRREF[...]RU5EIENFUlRJRklDQVRFLS0tLS0K
Edit ~/.kube/config and insert the content:
apiVersion: v1 kind: Config clusters: - name: "my-new-cluster" cluster: server: "https://{myip}:6443" certificate-authority-data: "{base64d ca.crt}" users: - name: "my-new-user" user: token: "{contents of /token}" contexts: - name: "new-cluster" context: user: "my-new-user" cluster: "my-new-cluster" current-context: "new-cluster"
After that you should be able to use kubectl to launch whatever resources you need.