I have a kops cluster with a max of 75 nodes and added with cluster autoscaler. It uses kubenet networking. Things have currently stopped working - ie scale down is no longer happening.
The cluster is running on max capacity ie 75 nodes even with almost no load. Not sure where to start to troubleshoot the problem.
See the following errors in the cluster autoscaler pod
I0222 01:45:14.327164 1 static_autoscaler.go:97] Starting main loop
W0222 01:45:14.770818 1 static_autoscaler.go:150] Cluster is not ready for autoscaling
I0222 01:45:15.043126 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:17.121507 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:19.126665 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:21.327581 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:23.331802 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0222 01:45:24.775124 1 static_autoscaler.go:97] Starting main loop
W0222 01:45:25.085442 1 static_autoscaler.go:150] Cluster is not ready for autoscaling
Autoscaling was working fine.
Update, also see the following errors when running kops validate cluster
VALIDATION ERRORS
KIND NAME MESSAGE
Node ip-172-20-32-173.ec2.internal node "ip-172-20-32-173.ec2.internal" is not ready
...
I0221 22:16:02.688911 2403 node_conditions.go:60] node "ip-172-20-51-238.ec2.internal" not ready: &NodeCondition{Type:NetworkUnavailable,Status:True,LastHeartbeatTime:2019-02-21 22:15:56 -0500 EST,LastTransitionTime:2019-02-21 22:15:56 -0500 EST,Reason:NoRouteCreated,Message:RouteController failed to create a route,}
cannot be removed
? How are allocated resources looking on the nodes? – Crou