The first thing I would recommend to check is that the amount of requested resources you have in PodSpec is enough to carry the load and that there is enough resources on system nodes to schedule all system pods.
You may try to prevent scheduling system pods to frontend or backend autoscaled nodes using either more simple nodeSelector
or more flexible Node Affinity
.
You can find good explanation and examples in document “Assigning Pods to Nodes”
Taints and Toleration
features are similar to Node Affinity
, but more from node perspective. They allow a node to repel a set of pods. Check the document “Taints and Tolerations” if you choose this way.
When you create node pool for autoscaling you can add labels
and taints
, so they will apply to nodes when Cluster Autoscaler (CA) upscale the pool.
In addition to restricting system
pods from scheduling on frontend
/backend
nodes it would be a good idea to configure PodDisruptionBudget
and autoscaler safe-to-evict
option for pods that could prevent CA from removing a node during downscale.
According to Cluster Autoscaler FAQ there are several types of pods that may prevent CA to downscale your cluster:
- Pods with restrictive PodDisruptionBudget (PDB).
- Kube-system pods that:
- are not run on the node by default,
- don't have PDB or their PDB is too restrictive (since CA 0.6).
- Pods that are not backed by a controller object (so not created by deployment, replica set, job, stateful set etc).
- Pods with local storage. *
- Pods that cannot be moved elsewhere due to various constraints (lack of resources, non-matching node selectors or affinity, matching anti-affinity, etc)
*Unless the pod has the following annotation (supported in CA 1.0.3 or later):
"cluster-autoscaler.kubernetes.io/safe-to-evict": "true"
Prior to version 0.6, Cluster Autoscaler was not touching nodes that were running important kube-system pods like DNS, Heapster, Dashboard etc.
If these pods landed on different nodes, CA could not scale the cluster down and the user could end up with a completely empty 3 node cluster.
In 0.6, was added an option to tell CA that some system pods can be moved around. If the user configures a PodDisruptionBudget for the kube-system pod, then the default strategy of not touching the node running this pod is overridden with PDB settings.
So, to enable kube-system pods migration, one should set minAvailable to 0 (or <= N if there are N+1 pod replicas.)
See also I have a couple of nodes with low utilization, but they are not scaled down. Why?
Cluster Autoscaler FAQ can help you choose correct version for you cluster.
To have better understanding of what is laying under the hood of Cluster Autoscaler check the official documentation