4
votes

We are running a kubernetes (1.9.4) cluster with 5 masters and 20 worker nodes. We are running one statefulset pod with replication 3 among other pods in this cluster. Initially the statefulset pods are distributed to 3 nodes. However the pod-2 on node-2 got evicted due to the disk pressure on node-2. However, when the pod-2 is evicted it went to node-1 where pod-1 was already running and node-1 was already experiencing node pressure. As per our understanding, the kubernetes-scheduler should not have scheduled a pod (non critical) to a node where there is already disk pressure. Is this the default behavior to not schedule the pods to a node under disk pressure or is it allowed. The reason is, at the same time we do observe, node-0 without any disk issue. So we were hoping that evicted pod on node-2 should have ideally come on node-0 instead of node-1 which is under disk pressure.

Another observation we had was, when the pod-2 on node-2 was evicted, we see that same pod is successfully scheduled and spawned and moved to running state in node-1. However we still see "Failed to admit pod" error in node-2 for many times for the same pod-2 that was evicted. Is this any issue with the kube-scheduler.

1

1 Answers

0
votes

Yes, Scheduler should not assign a new pod to a node with a DiskPressure Condition.

However, I think you can approach this problem from few different angles.

  1. Look into configuration of your scheduler:

    • ./kube-scheduler --write-config-to kube-config.yaml

and check it needs any adjustments. You can find info about additional options for kube-scheduler here:

  1. You can also configure aditional scheduler(s) depending on your needs. Tutorial for that can be found here

  2. Check the logs:

    • kubeclt logs: kube-scheduler events logs
    • journalctl -u kubelet: kubelet logs
    • /var/log/kube-scheduler.log (on the master)
  3. Look more closely at Kubelet's Eviction Thresholds (soft and hard) and how much node memory capacity is set.

  4. Bear in mind that:

    • Kubelet may not observe resources pressure fast enough or
    • Kubelet may evict more Pods than needed due to stats collection timing gap

Please check out my suggestions and let me know if they helped.