what would happen if i restart a node with some pods running

Question

Assume that there are some pods from Deployments/StatefulSet/DaemonSet, etc. running on a Kubernetes node.

Then I restarted the node directly, and then start docker, start kubelet with the same parameters.

What would happen to those pods?

Are they recreated with metadata saved locally from kubelet? Or use info retrieved from api-server? Or recovered from OCI runtime and behaves like nothing happened?
Is it that only stateless pod(no --local-data) can be recovered normally? If any of them has a local PV/dir, would they be connected back normally?
What if I did not restart the node for a long time? Would api-server assign other nodes to create those pods? What is the default timeout value? How can I configure this?

As far as I know:

 apiserver
    ^
    |(sync)
    V
  kubelet
    ^
    |(sync)
    V
-------------
| CRI plugin |(like api)
| containerd |(like api-server)
|    runc    |(low-level binary which manages container)
| c' runtime |(container runtime where containers run)
-------------

When kubelet received a PodSpec from kube-api-server, it calls CRI like a remote service, the steps be like:

create PodSandbox(a.k.a 'pause' image, always 'stopped')
create container(s)
run container(s)

So I guess that as the node and docker being restarted, steps 1 and 2 are already done, containers are at 'stopped' status; Then as kubelet being restarted, it pulls latest info from kube-api-server, found out that container(s) are not in 'running' state, so it calls CRI to run container(s), then everything are back to normal.

Please help me confirm.

Thank you in advance~

Never used Kubernetes, but from the amount of exposure I have with Docker, your node should start normally as you say that it's restart. That's essentially a terminate followed by a initiate operation. And the condition should be the same for both stateful and stateless pods. And your 3rd questions answer too should be yes, cause that's what Kubernetes is used for. — Debargha Roy

Jonas Jonas · Accepted Answer · 2020-09-23T14:44:02

Good questions. A few things first; a Pod is not pinned to a certain node. The nodes is mostly seen as a "server farm" that Kubernetes can use to run its workload. E.g. you give Kubernetes a set of nodes and you also give a set of e.g. Deployment - that is desired state of applications that should run on your servers. Kubernetes is responsible for scheduling these Pods and also keep them running when something in the cluster is changed.

Standalone pods is not managed by anything, so if a Pod crashes it is not recovered. You typically want to deploy your stateless apps as Deployments, that then initiates ReplicaSets that manage a set of Pods - e.g. 4 Pods - instances of your app.

Your desired state; a Deployment with e.g. replicas: 4 is saved in the etcd database within the Kubernetes control plane.

Then a set of controllers for Deployment and ReplicaSet is responsible for keeping 4 replicas of your app alive. E.g. if a node becomes unresponsible (or dies), new pods will be created on other Nodes, if they are managed by the controllers for ReplicaSet.

A Kubelet receives a PodSpecs that are scheduled to the node, and then keep these pods alive by regularly health checks.

Is it that only stateless pod(no --local-data) can be recovered normally?

Pods should be seen as emphemeral - e.g. can disappear - but is recovered by a controller that manages them - unless deployed as standalone Pod. So don't store local data within the pod.

There is also StatefulSet pods, those are meant for stateful workload - but distributed stateful workload, typically e.g. 3 pods, that use Raft to replicate data. The etcd database is an example of distributed database that uses Raft.

what would happen if i restart a node with some pods running

3 Answers