I have autoscaling enabled on Google Kubernetes Cluster and one of the pods I can see the usage is much lower
I have a total of 6 nodes and I expect at least this node to be terminated. I have gone through the following: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node
I have added this annotation to all my pods
cluster-autoscaler.kubernetes.io/safe-to-evict: true
However, the cluster autoscaler scales up correctly, but doesn't scale down as I expect it to.
I have the following logs
$ kubectl logs kube-dns-autoscaler-76fcd5f658-mf85c -n kube-system
autoscaler/pkg/autoscaler/k8sclient/k8sclient.go:90: Failed to list *v1.Node: Get https://10.55.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.55.240.1:443: getsockopt: connection refused
E0628 20:34:36.187949 1 reflector.go:190] github.com/kubernetes-incubator/cluster-proportional-autoscaler/pkg/autoscaler/k8sclient/k8sclient.go:90: Failed to list *v1.Node: Get https://10.55.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.55.240.1:443: getsockopt: connection refused
E0628 20:34:47.191061 1 reflector.go:190] github.com/kubernetes-incubator/cluster-proportional-autoscaler/pkg/autoscaler/k8sclient/k8sclient.go:90: Failed to list *v1.Node: Get https://10.55.240.1:443/api/v1/nodes?resourceVersion=0: net/http: TLS handshake timeout
I0628 20:35:10.248636 1 autoscaler_server.go:133] ConfigMap not found: Get https://10.55.240.1:443/api/v1/namespaces/kube-system/configmaps/kube-dns-autoscaler: net/http: TLS handshake timeout, will create one with default params
E0628 20:35:17.356197 1 autoscaler_server.go:95] Error syncing configMap with apiserver: configmaps "kube-dns-autoscaler" already exists
E0628 20:35:18.191979 1 reflector.go:190] github.com/kubernetes-incubator/cluster-proportional-autoscaler/pkg/autoscaler/k8sclient/k8sclient.go:90: Failed to list *v1.Node: Get https://10.55.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.55.240.1:443: i/o timeout
I am not sure the above are the relevant logs, what is the correct way to debug this issue?
My pods have got local storage. I have been trying to debug this issue using
kubectl drain gke-mynode-d57ded4e-k8tt
error: DaemonSet-managed pods (use --ignore-daemonsets to ignore): fluentd-gcp-v3.1.1-qzdzs, prometheus-to-sd-snqtn; pods with local storage (use --delete-local-data to override): mydocs-585879b4d5-g9flr, istio-ingressgateway-9b889644-v8bgq, mydocs-585879b4d5-7lmzk
I think it's safe to ignore daemonsets as CA should be ok to evict it, however I am not sure how to make the CA understand that mydocs is ok to be evicted and move to another node after adding the annotation
EDIT
The min and the max nodes have been set correctly as seen on the GCP console


skip-nodes-with-local-storageif you set cluster-autoscaler (and not using managed Kubernetes) - Christoper Hans