The correct answer: it depends.
Imagine, you've got 3 nodes cluster, where you created a Deployment with 3 replicas, and 3-5 standalone pods.
Pods are created and scheduled to nodes.
Everything is up and running.
Let's assume that worker node node1
has got 1 deployment replica and 1 or more standalone pods.
The general sequence of node restart process as follows:
- The node gets restarted, for ex. using
sudo reboot
- After restart, the node starts all OS processes in the order specified by
systemd
dependencies
- When
dockerd
is started it does nothing. At this point all previous containers has Exited
state.
- When
kubelet
is started it requests the cluster apiserver
for the list of Pods with node property equals its node name.
- After getting the reply from
apiserver
, kubelet
starts containers for all pods described in the apiserver
reply using Docker CRI.
- When
pause
container starts for each Pod from the list, it gets new IP address configured by CNI binary, deployed by Network addon Daemonset's Pod.
- After
kube-proxy
Pod is started on the node, it updates iptables rules to implement Kubernetes Services desired configuration, taking to account new Pods' IP addresses.
Now things become a bit more complicated.
Depending on apiserver
, kube-controller-manager
and kubelet
configuration, they reacts on the fact that node is not responding with some delay.
If the node restarts fast enough, kube-controller-manager
doesn't evict the Pods and they all remain scheduled on the same node increasing their RESTARTS
number after their new containers become Ready
.
Example 1.
The cluster is created using Kubeadm with Flannel network addon on Ubuntu 18.04
VM created in GCP.
Kubernetes version is v1.18.8
Docker version is 19.03.12
After the node is restarted, all Pods' containers are started on the node with new IP addresses. Pods keep their names and location.
If node is stopped for a long time, the pods on the node stays in Running
state, but connection attempts are obviously timed out.
If node remains stopped, after approximately 5 minutes pods scheduled on that node were evicted by kube-controller-manager
and terminated. If I would start node before that eviction all pods were remained on the node.
In case of eviction, standalone Pods disappear forever, Deployments and similar controllers create necessary number of pods to replace evicted Pods and kube-scheduler
puts them to appropriate nodes. If new Pod can't be scheduled on another node, for ex. due to lack of required volumes it will remain in Pending state, until the scheduling requirements were satisfied.
On a cluster created using Ubuntu 18.04 Vagrant box and Virtualbox hypervisor with host-only adapter dedicated for Kubernetes networking, pods on stopped node remains in the Running
, but with Readiness: false
state even after two hours, and were never evicted. After starting the node in 2 hours all containers were restarted successfully.
This configuration's behavior is the same all the way from Kubernetes v1.7
till the latest v1.19.2
.
Example 2.
The cluster is created in Google cloud (GKE) with the default kubenet
network addon:
Kubernetes version is 1.15.12-gke.20
Node OS is Container-Optimized OS (cos)
After the node is restarted (it takes around 15-20 seconds) all pods are started on the node with new IP addresses. Pods keep their names and location. (same with example 1)
If the node is stopped, after short period of time (T1 equals around 30-60 seconds) all pods on the node change status to Terminating. Couple minutes later they disappear from the Pods list. Pods managed by Deployment are rescheduled on other nodes with new names and ip addresses.
If the node pool is created with Ubuntu nodes, apiserver terminates Pods later, T1 equals around 2-3 minutes.
The examples show that the situation after worker node gets restarted is different for different clusters, and it's better to run the experiment on a specific cluster to check if you can get the expected results.
How to configure those timeouts: