1
votes

GKE cluster is configured with cluster/node auto provisioning.

I have created a default node-pool on which system specific pods can be run. Whenever pods with GPU are requested, GKE automatically creates new GPU enabled node pool which is fine.

But, whenever I delete such pods, GKE doesn't downscale newly create node pool to zero instances. Instead, one instance keeps running. If no GPU requested, node pool supposed to go to minimum size i.e. zero.

NOTE:

  • For GPU drivers, a Daemonset has been created under 'kube-system' namespace, Pods for this Daemonsets run on each GPU enabled node.

I edited this Daemonset and also added label '"cluster-autoscaler.kubernetes.io/safe-to-evict": "true" ' to pods.

Can someone help how to downscale newly create node pool to zero nodes?

UPDATE:

Pods that are running on new nodes are:

fluentd-gcp (From DaemonSet)

kube-proxy

nvidia-gpu-device-plugin (From DaemonSet)

Aren't these pods should get evicted ?

1
What is running on the remaining node? Review this question and answer: stackoverflow.com/questions/59217515/…John Hanley
@JohnHanley Please check updates. I went through provided link, but running pods should automatically get evicted from this node, right ? If no, what would be good practice to evict them?AVJ

1 Answers

2
votes

GKE by default keeps an extra node resource for quick pod scheduling. This is default behavior controlled by auto scaling policy.

This behavior can be changed by setting policy to 'optimize-utilization'.

https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler