Prevent Kubernetes rescheduling hundreds of pods when a node is failing

Question

I am using a Kubernetes cluster with 2 workers. I have approximately 100 deployments. Each of them has 2 or 4 replicas (so I have approximately 300 pods per worker, yeah it's a lot of pods).

My problems are: When a worker is down, Kubernetes is trying to redeploy every failing pod on the remaining alive node. So at the end of the operation I have: - the remaining alive worker node with 600 pods - master nodes load average is lava because they are rescheduling 300 pods - when the failing worker node is back alive, he is empty because every pods are on the other worker node.

The only solution I found: Making 2 deployments for every applications (one per worker) to prevent the rescheduling of 300 pods.

Are there better solutions please ?

Radek 'Goblin' Pieczonka Radek 'Goblin' Pieczonka · Accepted Answer · 2018-08-09T10:47:54

Yes, one of the ways you can approach this for a 2 pod deployments is tu use Pod Anti-Affinity to say that pods from given deployment can not coexist together on the same server, which would result in at most 1 pod of deployment started per server and the rest in Pending state until new nodes become available.

Prevent Kubernetes rescheduling hundreds of pods when a node is failing

1 Answers