4
votes

I have autoscaling enabled on Google Kubernetes Cluster and one of the pods I can see the usage is much lower

enter image description here

I have a total of 6 nodes and I expect at least this node to be terminated. I have gone through the following: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node

I have added this annotation to all my pods

cluster-autoscaler.kubernetes.io/safe-to-evict: true

However, the cluster autoscaler scales up correctly, but doesn't scale down as I expect it to.

I have the following logs

$ kubectl  logs kube-dns-autoscaler-76fcd5f658-mf85c -n kube-system

autoscaler/pkg/autoscaler/k8sclient/k8sclient.go:90: Failed to list *v1.Node: Get https://10.55.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.55.240.1:443: getsockopt: connection refused
E0628 20:34:36.187949       1 reflector.go:190] github.com/kubernetes-incubator/cluster-proportional-autoscaler/pkg/autoscaler/k8sclient/k8sclient.go:90: Failed to list *v1.Node: Get https://10.55.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.55.240.1:443: getsockopt: connection refused
E0628 20:34:47.191061       1 reflector.go:190] github.com/kubernetes-incubator/cluster-proportional-autoscaler/pkg/autoscaler/k8sclient/k8sclient.go:90: Failed to list *v1.Node: Get https://10.55.240.1:443/api/v1/nodes?resourceVersion=0: net/http: TLS handshake timeout
I0628 20:35:10.248636       1 autoscaler_server.go:133] ConfigMap not found: Get https://10.55.240.1:443/api/v1/namespaces/kube-system/configmaps/kube-dns-autoscaler: net/http: TLS handshake timeout, will create one with default params
E0628 20:35:17.356197       1 autoscaler_server.go:95] Error syncing configMap with apiserver: configmaps "kube-dns-autoscaler" already exists
E0628 20:35:18.191979       1 reflector.go:190] github.com/kubernetes-incubator/cluster-proportional-autoscaler/pkg/autoscaler/k8sclient/k8sclient.go:90: Failed to list *v1.Node: Get https://10.55.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.55.240.1:443: i/o timeout

I am not sure the above are the relevant logs, what is the correct way to debug this issue?

My pods have got local storage. I have been trying to debug this issue using

kubectl drain  gke-mynode-d57ded4e-k8tt

error: DaemonSet-managed pods (use --ignore-daemonsets to ignore): fluentd-gcp-v3.1.1-qzdzs, prometheus-to-sd-snqtn; pods with local storage (use --delete-local-data to override): mydocs-585879b4d5-g9flr, istio-ingressgateway-9b889644-v8bgq, mydocs-585879b4d5-7lmzk

I think it's safe to ignore daemonsets as CA should be ok to evict it, however I am not sure how to make the CA understand that mydocs is ok to be evicted and move to another node after adding the annotation

EDIT

The min and the max nodes have been set correctly as seen on the GCP console enter image description here

2
Did you set the correct minimum number of nodes for this node poole? - FL3SH
@FL3SH I think so. I can see it on GCP console (see edit section in the question) - kosta
GKE's Cluster Autoscaler could not be modified to allow eviction of pods with local storage. On autoscaler FAQ, you can set up skip-nodes-with-local-storage if you set cluster-autoscaler (and not using managed Kubernetes) - Christoper Hans

2 Answers

3
votes

The kubectl logs command is for the DNS autoscaler, not the cluster autoscaler. It will give you information on the number of kube-dns replicas in the cluster, not the number of nodes or scaling decisions.

From the cluster autoscaler FAQ (and taking into account what you wrote in your question):

Kube-system pods that:

  • are not run on the node by default
  • Pods with local storage

And additionally, restrictive Pod Disruption Budgets. However since is not stated in the question, I'll assume you haven't set any.

Although you have pods with local storage, you added the annotation to make them safe to evict so that leaves the system pods not run by default in the nodes.

Since system pods in GKE are annotated with the reconciliation loop, you can't add this directive to them, which might be preventing their eviction.

In this scenario, you may consider using a Pod Disruption Budget configured to allow the autoscaler to evict them.

This Pod Disruption Budget can include DNS and logging pods that aren't run by default in the nodes.

Unfortunately, GKE is a managed option so there isn't much to apply from the autoscaler FAQ. However, if you want to go further, you might as well consider a pod binpacking strategy using Affinity and anti-affinity, Taints and tolerations and requests and limits to fit them properly, making the downscaling easier whenever possible.

Finally, on GKE you can use the cluster-autoscaler-status ConfigMap to check what decisions the autoscaler is making.

0
votes

In the meantime GKE has added autoscaling profiles:

https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler#autoscaling_profiles

optimize-utilization: Prioritize optimizing utilization over keeping spare resources in the cluster. When selected, the cluster autoscaler scales down the cluster more aggressively: it can remove more nodes, and remove nodes faster. This profile has been optimized for use with batch workloads that are not sensitive to start-up latency. We do not currently recommend using this profile with serving workloads.

optimize-utilization helped to scale down our staging system to zero nodes, when all Kubernetes resources have been removed. However it still takes a few minutes until GKE reacts and starts to scale down the node pools.