In my Kubernetes, I sometimes try to run a pod on a specific worker node. Maybe one of them has a folder that I need or a specific hardware characteristic. Historically, I’ve used Pod spec.nodeName: srv5 However, when that node becomes unavailable, say because it’s run out of disk space and has DiskPressure on it, then Kubernetes will continually try to spin up thousands of pods on it.
A screenshot of CPU usage growing, then Prometheus falls over and can’t scrap anymore.

As it turns out, if you set the Pod spec.nodeName, then Kubernetes directly assigns the pod to the node bypassing kube-scheduler’s work to verify resource availability. The Kubernetes docs allude to an issue where this could cause nodes to become oversubscribed.
I still want my pod to run on a specific node, but I want it to only run when there’s capacity.
Instead, it’s better to use pod affinity rules to cause the pod to be scheduled, but still consider capacity. Thus, I want the ability to block usage of nodeName and instead use the affinity rules.
For this, I’m going to use Kyverno which is a Kubernetes policy engine that can validate Kubernetes resources as they’re created. It can even mutate the resources.
I came up with this policy that automatically converts any pod created with a specific nodeName into a pod affinity rule:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
| apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
annotations:
policies.kyverno.io/description: >-
Bypassing the kube-scheduler using nodeName can lead to endless pod
creation loops if the node becomes unschedulable (e.g., DiskPressure).
This policy intercepts Pods using nodeName, drops the field, and replaces
it with an equivalent nodeAffinity to allow the scheduler to handle it
safely.
policies.kyverno.io/subject: Pod
policies.kyverno.io/title: Replace nodeName with NodeAffinity
name: disallow-pod-nodename
rules:
- match:
any:
- resources:
kinds:
- Pod
operations:
- CREATE
mutate:
patchStrategicMerge:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- '{{ request.object.spec.nodeName }}'
nodeName: null
name: mutate-nodename
preconditions:
all:
- key: '{{ request.object.spec.nodeName || '''' }}'
operator: NotEquals
value: ''
|
Comments
To give feedback, send an email to adam [at] this website url.
Donate
If you've found these posts helpful and would like to support this work directly, your contribution would be appreciated and enable me to dedicate more time to creating future posts. Thank you for joining me!
