I'm running a Kubernetes cluster in EKS, but for some reason the nodeSelector attribute on a deployment isn't always being followed.
Three deployments: 1 - Cassandra:
kind: StatefulSet
metadata:
name: cassandra
labels:
app: cassandra
spec:
serviceName: cassandra
replicas: 3
...
spec:
terminationGracePeriodSeconds: 1800
containers:
- name: cassandra
image: gcr.io/google-samples/cassandra:v13
...
nodeSelector:
layer: "backend"
2 - Kafka
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
service: kafka
...
spec:
containers:
image: strimzi/kafka:0.11.3-kafka-2.1.0
...
nodeSelector:
layer: "backend"
...
3 - Zookeeper
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
service: zookeeper
...
spec:
containers:
image: strimzi/kafka:0.11.3-kafka-2.1.0
...
nodeSelector:
layer: "backend"
...
Note - all three have the nodeSelector "layer=backend" on container spec. I only have 2 "backend" pods, however, when I look at the pods I see:
% kubectl get all -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/cassandra-0 1/1 Running 0 9m32s 10.1.150.39 ip-...-27.us-west-2.compute.internal <none> <none>
pod/cassandra-1 1/1 Running 0 7m56s 10.1.100.7 ip-...-252.us-west-2.compute.internal <none> <none>
pod/cassandra-2 1/1 Running 0 6m46s 10.1.150.254 ip-...-27.us-west-2.compute.internal <none> <none>
pod/kafka-56dcd8665d-hfvz4 1/1 Running 0 9m32s 10.1.100.247 ip-...-252.us-west-2.compute.internal <none> <none>
pod/zookeeper-7f74f96f56-xwjjt 1/1 Running 0 9m32s 10.1.100.128 ip-...-154.us-west-2.compute.internal <none> <none>
They are placed on three different nodes - 27, 252 and 154. Looking at the "layer" label on each of those:
> kubectl describe node ip-...-27.us-west-2.compute.internal | grep layer
layer=backend
> kubectl describe node ip-...-252.us-west-2.compute.internal | grep layer
layer=backend
> kubectl describe node ip-...-154.us-west-2.compute.internal | grep layer
layer=perf
The 154 node has a label of "perf", not "backend". So per my understanding of nodeSelector, the zookeeper pod shouldn't have been put there. I've deleted everything (including the nodes themselves) and tried a few times - sometimes it's kafka that gets put there, sometimes zookeeper, but reliably something gets put where it shouldn't.
As near as I can tell, the nodes I do want have plenty of capacity, and even if they didn't I would expect an error that the pod couldn't be scheduled rather than ignoring the nodeSelector.
What am I missing? Is nodeSelector not 100% reliable? Is there another way I can force pods to only be placed on nodes with specific labels?
nodeSelector:
and far more likely that your experiment is flawed – mdanielbackend
and then apply a matching toleration to the pods. – coderanger