I have a kube cluster setup with kubeadm init (mostlydefaults). Everything works as intended, except for the fact that if one of my nodes goes offline while pods are running on it, the pods stay in the Running
status indefinitely. From what I've read, they should go to Unknown
or Failure
status, and after --pod-eviction-timeout (default 5m) they should be rescheduled to another healthy node.
Here's my pods after 20 + minutes of Node 7 being offline (I've also left it for over two days once with no reschedule):
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
workshop-30000-77b95f456c-sxkp5 1/1 Running 0 20m REDACTED node7 <none> <none>
workshop-operator-657b45b6b8-hrcxr 2/2 Running 0 23m REDACTED node7 <none> <none>
kubectl get deployments -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/workshop-30000 0/1 1 0 21m workshop-ubuntu REDACTED client=30000
deployment.apps/workshop-operator 0/1 1 0 17h ansible,operator REDACTED name=workshop-operator
You can see the pods still flagged as Running
, whereas their deployments have Ready: 0/1
.
Here are my nodes:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kubernetes-master Ready master 34d v1.17.3 REDACTED <none> Ubuntu 19.10 5.3.0-42-generic docker://19.3.2
kubernetes-worker NotReady <none> 34d v1.17.3 REDACTED <none> Ubuntu 19.10 5.3.0-29-generic docker://19.3.2
node3 NotReady worker 21d v1.17.3 REdACTED <none> Ubuntu 19.10 5.3.0-40-generic docker://19.3.2
node4 Ready <none> 19d v1.17.3 REDACTED <none> Ubuntu 19.10 5.3.0-40-generic docker://19.3.2
node6 NotReady <none> 5d7h v1.17.4 REDACTED <none> Ubuntu 19.10 5.3.0-42-generic docker://19.3.6
node7 NotReady <none> 5d6h v1.17.4 REDACTED <none> Ubuntu 19.10 5.3.0-42-generic docker://19.3.6
What could the issue be? All my containers have readiness and liveness probes. I've tried searching through the docs and elsewhere, but haven't been able to find anything that solves this.
Currently, if a node goes down, the only way I can get the pods that were on it to be rescheduled to a live node is if I manually delete them with --force and --graceperiod=0, which defeats some of the main goals of Kubernetes: automation and self-healing.
According to the docs: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-lifetime
If a node dies or is disconnected from the rest of the cluster, Kubernetes applies a policy for setting the phase of all Pods on the lost node to Failed.
---------- Extra information ---------------
kubectl describe pods workshop-30000-77b95f456c-sxkp5
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned workshop-operator/workshop-30000-77b95f456c-sxkp5 to node7
Normal Pulling 37m kubelet, node7 Pulling image "REDACTED"
Normal Pulled 37m kubelet, node7 Successfully pulled image "REDACTED"
Normal Created 37m kubelet, node7 Created container workshop-ubuntu
Normal Started 37m kubelet, node7 Started container workshop-ubuntu
Warning Unhealthy 36m (x2 over 36m) kubelet, node7 Liveness probe failed: Get http://REDACTED:8080/healthz: dial tcp REDACTED:8000: connect: connection refused
Warning Unhealthy 36m (x3 over 36m) kubelet, node7 Readiness probe failed: Get http://REDACTED:8000/readyz: dial tcp REDACTED:8000: connect: connection refused
I believe those liveness and readiness probe failures were just due to slow start. It seems it's not checking liveness/readiness after the node goes down (last check was 37 minutes ago).
This is a self hosted cluster with the following version:
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
Thanks to all who help.
EDIT: It was either a bug or a potential misconfiguration when initially bootstrapping with kubeadm. A full reinstall of kubernetes cluster and update from 1.17.4 to 1.18 solved the problem and now pods are rescheduled from dead nodes.