Don't use nodeName in Kubernetes

In my Kubernetes, I sometimes try to run a pod on a specific worker node. Maybe one of them has a folder that I need or a specific hardware characteristic. Historically, I’ve used Pod spec.nodeName: srv5 However, when that node becomes unavailable, say because it’s run out of disk space and has DiskPressure on it, then Kubernetes will continually try to spin up thousands of pods on it.

A screenshot of CPU usage growing, then Prometheus falls over and can’t scrap anymore.

As it turns out, if you set the Pod spec.nodeName, then Kubernetes directly assigns the pod to the node bypassing kube-scheduler’s work to verify resource availability. The Kubernetes docs allude to an issue where this could cause nodes to become oversubscribed.

I still want my pod to run on a specific node, but I want it to only run when there’s capacity.

Instead, it’s better to use pod affinity rules to cause the pod to be scheduled, but still consider capacity. Thus, I want the ability to block usage of nodeName and instead use the affinity rules.

For this, I’m going to use Kyverno which is a Kubernetes policy engine that can validate Kubernetes resources as they’re created. It can even mutate the resources.

I came up with this policy that automatically converts any pod created with a specific nodeName into a pod affinity rule:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  annotations:
    policies.kyverno.io/description: >-
      Bypassing the kube-scheduler using nodeName can lead to endless pod 
      creation loops if the node becomes unschedulable (e.g., DiskPressure). 
      This policy intercepts Pods using nodeName, drops the field, and replaces 
      it with an equivalent nodeAffinity to allow the scheduler to handle it
      safely.
    policies.kyverno.io/subject: Pod
    policies.kyverno.io/title: Replace nodeName with NodeAffinity
  name: disallow-pod-nodename
  rules:
    - match:
        any:
          - resources:
              kinds:
                - Pod
              operations:
                - CREATE
      mutate:
        patchStrategicMerge:
          spec:
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                    - matchExpressions:
                        - key: kubernetes.io/hostname
                          operator: In
                          values:
                            - '{{ request.object.spec.nodeName }}'
            nodeName: null
      name: mutate-nodename
      preconditions:
        all:
          - key: '{{ request.object.spec.nodeName || '''' }}'
            operator: NotEquals
            value: ''
Copyright - All Rights Reserved

Comments

To give feedback, send an email to adam [at] this website url.

Donate

If you've found these posts helpful and would like to support this work directly, your contribution would be appreciated and enable me to dedicate more time to creating future posts. Thank you for joining me!

Donate to my blog