0
votes

I'm having troubles with my Kubernetes ingress on GKE. I'm simulating termination of a preemptible instance by manually deleting it (through the GCP dashboard). I am running a regional GKE cluster (one VM in each avaibility zone in us-west1).

A few seconds after selecting delete on only one of the VMs I start receiving 502 errors through the load balancer. Stackdriver logs for the load balancer list the error as failed_to_connect_to_backend.

Monitoring the health of backend-service shows the backend being terminated go from HEALTHY to UNHEALTHY and then disappearing while the other two backends remain HEALTHY.

After a few seconds requests begin to succeed again.

I'm quite confused why the load balancer is unable to direct traffic to the healthy nodes while one goes down - or maybe this is a kubernetes issue? Could the load balancer be properly routing traffic to a healthy instance, but the kubernetes NodePort service on that instance proxies the request back to the unhealthy instance for some reason?

1

1 Answers

0
votes

Well, I would say if you kill a node from GCP Console, you are kind of killing it from outside in. It will take time until kubelet will realize this event. So kube-proxy also won't update service endpoint and the iptables immediately.

Until that happens, ingress controller will keep sending packets to the services specified by ingress rule, and the services to the pods, that no longer exist.

This is just a speculation. I might be wrong. But from GCP documentation, if you are using preemptible VMs, your app should be fail tolerant.

[EXTRA]

So, let's consider two general scenarios. In the first one we will send kubectl delete pod command, while with the second one we will kill a node abruptly.

  1. with kubectl delete pod ... you are saying api-server that you want to kill a pod. api-server will summon kubelet to kill the pod, it will re-create it on another node (if the case). kube-proxy will update the iptables so the services will forward the requests to the right pod.
  2. If you kill the node, that's kubelet that first realizes that something goes wrong, so it reports this to the api-server. api-server will re-schedule the pods on a different node (always). The rest is the same.

My point is that there is a difference between api-server knowing from the beginning that no packets can be send to a pod, and being notified once kubelet realizes that the node is unhealthy.

How to solve this? you can't. And actually this should be logical. You want to have the same performance with a preemptible machines, that cost about 5 times cheaper, then a normal VM? If this would be possible, everyone would be using these VMs.

Finally, again, Google advises using preemptible, if your application is failure tolerant.