7
votes

I am running a cluster on GKE and sometimes I get into a hanging state. Right now I was working with just two nodes and allowed the cluster to autoscale. One of the nodes has a NotReady status and simply stays in it. Because of that, half of my pods are Pending, because of insufficient CPU.

How I got there

I deployed a pod which has quite high CPU usage from the moment it starts. When I scaled it to 2, I noticed CPU usage was at 1.0; the moment I scaled the Deployment to 3 replicas, I expected to have the third one in Pending state until the cluster adds another node, then schedule it there.
What happened instead is the node switched to a NotReady status and all pods that were on it are now Pending. However, the node does not restart or anything - it is just not used by Kubernetes. The GKE then thinks that there are enough resources as the VM has 0 CPU usage and won't scale up to 3. I cannot manually SSH into the instance from console - it is stuck in the loading loop.

I can manually delete the instance and then it starts working - but I don't think that's the idea of fully managed.

One thing I noticed - not sure if related: in GCE console, when I look at VM instances, the Ready node is being used by the instance group and the load balancer (which is the service around an nginx entry point), but the NotReady node is only in use by the instance group - not the load balancer.

Furthermore, in kubectl get events, there was a lineļ¼š

Warning   CreatingLoadBalancerFailed   {service-controller }          Error creating load balancer (will retry): Failed to create load balancer for service default/proxy-service: failed to ensure static IP 104.199.xx.xx: error creating gce static IP address: googleapi: Error 400: Invalid value for field 'resource.address': '104.199.xx.xx'. Specified IP address is already reserved., invalid

I specified loadBalancerIP: 104.199.xx.xx in the definition of the proxy-service to make sure that on each restart the service gets the same (reserved) static IP.

Any ideas on how to prevent this from happening? So that if a node gets stuck in NotReady state it at least restarts - but ideally doesn't get into such state to begin with?

Thanks.

2

2 Answers

8
votes

The first thing I would do is to define Resources and Limits for those pods.

Resources tell the cluster how much memory and CPU you think that the pod is going to use. You do this to help the scheduler to find the best location to run those pods.

Limits are crucial here: they are set to prevent your pods damaging the stability of the nodes. It's better to have a pod killed by an OOM than a pod bringing a node down because of resource starvation.

For example, in this case you're saying that you want 200m CPU (20%) for your pod but if for any chance it goes above 300 (30%), you want the scheduler to kill it and start a new one.

spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: nginx
    resources:
      limits:
        cpu: 300m
        memory: 200Mi
      requests:
        cpu: 200m
        memory: 100Mi

You can read more here: http://kubernetes.io/docs/admin/limitrange/

0
votes

For AWS I can tell. It goes in NotReady state because of out of memory or maybe insufficient CPU. You can create a custom memory metric to collect memory metric of all the worker nodes in the cluster collectively and push it to cloudwatch. You can follow this documentation- https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mon-scripts.html CPU metric is already there so no need to create it. So a memory metric will be created for you cluster. You can now create an alarm for it when it goes above certain threshold. Now you have to go to the Auto Scaling Group through AWS console. Now you have to add a scaling policy for your autoscaling group selecting the alarm that you created and add number of instance accordingly.