I have a regional cluster set up in google kubernetes engine (GKE). The node group is a single vm in each region (3 total). I have a deployment with 3 replicas minimum controlled by a HPA. The nodegroup is configured to be autoscaling (cluster autoscaling aka CA). The problem scenario:
Update deployment image. Kubernetes automatically creates new pods and the CA identifies that a new node is needed. I now have 4. The old pods get removed when all new pods have started, which means I have the exact same CPU request as the minute before. But the after the 10 min maximum downscale time I still have 4 nodes.
The CPU requests for the nodes is now:
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
358m (38%) 138m (14%) 516896Ki (19%) 609056Ki (22%)
--
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
800m (85%) 0 (0%) 200Mi (7%) 300Mi (11%)
--
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
510m (54%) 100m (10%) 410Mi (15%) 770Mi (29%)
--
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
823m (87%) 158m (16%) 484Mi (18%) 894Mi (33%)
The 38% node is running:
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system event-exporter-v0.1.9-5c8fb98cdb-8v48h 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system fluentd-gcp-v2.0.17-q29t2 100m (10%) 0 (0%) 200Mi (7%) 300Mi (11%)
kube-system heapster-v1.5.2-585f569d7f-886xx 138m (14%) 138m (14%) 301856Ki (11%) 301856Ki (11%)
kube-system kube-dns-autoscaler-69c5cbdcdd-rk7sd 20m (2%) 0 (0%) 10Mi (0%) 0 (0%)
kube-system kube-proxy-gke-production-cluster-default-pool-0fd62aac-7kls 100m (10%) 0 (0%) 0 (0%) 0 (0%)
I suspect it wont downscale because heapster or kube-dns-autoscaler. But the 85% pod contains:
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
kube-system fluentd-gcp-v2.0.17-s25bk 100m (10%) 0 (0%) 200Mi (7%) 300Mi (11%)
kube-system kube-proxy-gke-production-cluster-default-pool-7ffeacff-mh6p 100m (10%) 0 (0%) 0 (0%) 0 (0%)
my-deploy my-deploy-54fc6b67cf-7nklb 300m (31%) 0 (0%) 0 (0%) 0 (0%)
my-deploy my-deploy-54fc6b67cf-zl7mr 300m (31%) 0 (0%) 0 (0%) 0 (0%)
The fluentd and kube-proxy pods are present on every node, so I assume they are not needed without the node. Which means that my deployment could be relocated to the other nodes since it only has a request of 300m (31% since only 94% of node CPU is allocatable).
So I figured that Ill check the logs. But if I run kubectl get pods --all-namespaces
there are no pod visible on GKE for the CA. And if I use the command kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml
it only tells me if it is about to scale, not why or why not.
Another option is to look at /var/log/cluster-autoscaler.log
in the master node. I SSH:ed in the all 4 nodes and only found a gcp-cluster-autoscaler.log.pos
file that says: /var/log/cluster-autoscaler.log 0000000000000000 0000000000000000
meaning the file should be right there but is empty.
Last option according to the FAQ, is to check the events for the pods, but as far as i can tell they are empty.
Anyone know why it wont downscale or atleast where to find the logs?