In my previous post where I outlined challenges that I’ve encountered with Rancher. As part of the feedback to that I ended up having to rebuild one of my clusters. I took that time to try out RKE2 and K3s for my home lab. In this home lab, I use a custom CNI based on the official Bridge and DHCP IPAM CNIs (Read more) to enable my smart home software (HomeAssistant) to communicate with other devices on the same Layer 2 domain.
However, it seems that if you try to spin up a RKE2 cluster on a host with a Bridge interface setup (See here) then it’ll get stuck during provisioning and you won’t be able to download a Kube Config from Rancher Server because Rancher thinks it’s offline. I reported this issue initially here.
In this blog post, I explain more about the problem and how to directly connect to the cluster to install a working CNI such that Rancher will correctly start.
Problem Continued
In this cluster, I setup a single Ubuntu Server node that has a bridge interface configured exactly as I’ve done before (See here). I’ve configured with cni:multus,calico
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| $ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master cni0 state UP group default qlen 1000
link/ether 00:15:5d:02:cb:08 brd ff:ff:ff:ff:ff:ff
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:06:25:9b:31 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
4: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 00:15:5d:02:cb:08 brd ff:ff:ff:ff:ff:ff
inet 192.168.2.241/24 brd 192.168.2.255 scope global dynamic cni0
inet6 fe80::215:5dff:fe02:cb08/64 scope link
valid_lft forever preferred_lft forever
$ ip route
default via 192.168.2.1 dev cni0 proto dhcp src 192.168.2.241 metric 1024
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.2.0/24 dev cni0 proto kernel scope link src 192.168.2.241
192.168.2.1 dev cni0 proto dhcp scope link src 192.168.2.241 metric 1024
|
There’s a valid route outwards however Calico can’t start because it reports:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| $ sudo crictl --runtime-endpoint=unix:///run/k3s/containerd/containerd.sock ps -a
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
ac2bab78f970e c59896fc7ca44 3 minutes ago Exited calico-node 8 c9c4aa34f68a9
$ sudo crictl --runtime-endpoint=unix:///run/k3s/containerd/containerd.sock logs ac2bab78f970e
...
2022-04-10 18:12:09.146 [WARNING][10] startup/startup.go 710: Unable to auto-detect an IPv4 address: no valid IPv4 addresses found on the host interfaces
2022-04-10 18:12:09.146 [WARNING][10] startup/startup.go 477: Couldn't autodetect an IPv4 address. If auto-detecting, choose a different autodetection method. Otherwise provide an explicit address.
2022-04-10 18:12:09.146 [INFO][10] startup/startup.go 361: Clearing out-of-date IPv4 address from this node IP=""
2022-04-10 18:12:09.150 [WARNING][10] startup/utils.go 48: Terminating
Calico node failed to start
Rancher Server shows the following logs.
[INFO ] waiting for at least one bootstrap node
[INFO ] provisioning bootstrap node(s) custom-3bfcd9ce3995: waiting for agent to check in and apply initial plan
[INFO ] provisioning bootstrap node(s) custom-3bfcd9ce3995: waiting on probes: etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
[INFO ] provisioning bootstrap node(s) custom-3bfcd9ce3995: waiting on probes: etcd, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] provisioning bootstrap node(s) custom-3bfcd9ce3995: waiting on probes: kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] non-ready bootstrap machine(s) custom-3bfcd9ce3995: waiting for cluster agent to be available and join url to be available on bootstrap node
|
The cluster will never progress because Rancher needs to launch the cattle-cluster-agent, but this needs a working CNI to launch correctly. However, we can’t fix the CNI because Rancher won’t give a Kube Config that allows us to connect to the cluster and deploy the working CNI.
RKE2 - Get a valid credential
Since I have full access to the host running the RKE2 cluster, I should be able to gain access to it somehow. Each Kubernetes pod deployed to a host gets a special volume mounted inside the container that it can use to communicate to the Kubernetes apiserver. By default, these pods don’t generally have any privileges, but if we can find one that has enough privileges to create the resources we need, we can get the cluster working.
In this cluster, I enabled the kubernetes API endpoint in Rancher. This deployed a container called kube-api-auth. Luckily this container grants all the privileges we need.
1
2
| $ sudo ctr --address /run/k3s/containerd/containerd.sock --namespace k8s.io c ls | grep kube-api-auth
57d148cadbdceb998ab7be8e38f72dec1fa0fe8c6f313dcab19e09ba9245eb1f docker.io/rancher/kube-api-auth:v0.1.6 io.containerd.runc.v2
|
There may be two containers displayed. One of them is the pause container which serves as a special init process. If you want to know why here’s a good blog post.
Inspect the container and look for the volume mount kube-api-access:
1
2
| $ sudo ctr --address /run/k3s/containerd/containerd.sock --namespace k8s.io c info 57d148cadbdceb998ab7be8e38f72dec1fa0fe8c6f313dcab19e09ba9245eb1f | grep kube-api-access
"source": "/var/lib/kubelet/pods/972647f6-e514-40c6-a0a0-6891898a2dec/volumes/kubernetes.io~projected/kube-api-access-vj98z"
|
1
2
| $ sudo ls /var/lib/kubelet/pods/972647f6-e514-40c6-a0a0-6891898a2dec/volumes/kubernetes.io~projected/kube-api-access-vj98z
ca.crt namespace token
|
The JWT token can be extracted from the ’token’ file. With this we’re going to effectively impersonate this container and the use the privileges that it has:
1
2
3
| $ sudo cat /var/lib/kubelet/pods/972647f6-e514-40c6-a0a0-6891898a2dec/volumes/kubernetes.io~projected/kube-api-access-vj98z/token
eyJhbGciOi...My
|
Kubectl needs the CA certificate to validate the SSL certificate:
1
2
3
| $ sudo cat "/var/lib/kubelet/pods/22d4bf53-2f87-4d58-9272-9c4d0bad47f2/volumes/kubernetes.io~projected/kube-api-access-tl4qz/ca.crt" | base64 -w 0 ; echo
LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJlVENDQVIrZ0F3SUJBZ0lCQURBS0JnZ3Foa2pPUFFRREF\[...\]RU5EIENFUlRJRklDQVRFLS0tLS0K
|
Edit ~/.kube/config and insert the content:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| apiVersion: v1
kind: Config
clusters:
- name: "my-new-cluster"
cluster:
server: "https://**{myip}**:6443"
certificate-authority-data: "**{base64d ca.crt}**"
users:
- name: "my-new-user"
user:
token: "**{contents of /token}**"
contexts:
- name: "new-cluster"
context:
user: "my-new-user"
cluster: "my-new-cluster"
current-context: "new-cluster"
|
After that you should be able to use kubectl to launch whatever resources you need. Remember to change get a new kubectl from Rancher afterwards so you’re not using system level credentials.
Comments
Comments are currently unavailable while I move to this new blog platform. To give feedback, send an email to adam [at] this website url.