we are running 6 nodes in K8s cluster. Out of 6, 2 of them running RabbitMQ, Redis & Prometheus we have used node-selector & cordon node so no other pods schedule on that particular nodes.
On renaming other 4 nodes application PODs run, we have around 18-19 micro services. For GKE there is one open issue in K8s official repo regarding auto scale down: https://github.com/kubernetes/kubernetes/issues/69696#issuecomment-651741837 automatically however people are suggesting approach of setting PDB and we that tested on Dev/Stag.
What we are looking for now is to fix PODs on particular node pool which do not scale, as we are running single replicas of some services.
As of now, we thought of using and apply affinity to those services which are running with single replicas and no requirement of scaling.
while for scalable services we won't specify any type of rule so by default K8s scheduler will schedule pod across different nodes, so this way if any node scale down we dont face any downtime for single running replica service.
Affinity example :
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: do-not-scale
operator: In
values:
- 'true'
We are planning to use affinity type preferredDuringSchedulingIgnoredDuringExecution
instead of requiredDuringSchedulingIgnoredDuringExecution
.
Note : Here K8s is not creating new replica first on another node during node drain (scaledown of any node) as we are running single replicas with rolling update & minAvailable: 25% strategy.
Why: If PodDisruptionBudget is not specified and we have a deployment with one replica, the pod will be terminated and then a new pod will be scheduled on a new node.
To make sure the application will be available during the node draining process we have to specify PodDisruptionBudget and create more replicas. If we have 1 pod with minAvailable: 30% it will refuse to drain node (scaledown).
Please point out a mistake if you are seeing anything wrong & suggest better option.