I'm having troubles with my Kubernetes ingress on GKE. I'm simulating termination of a preemptible instance by manually deleting it (through the GCP dashboard). I am running a regional GKE cluster (one VM in each avaibility zone in us-west1
).
A few seconds after selecting delete on only one of the VMs I start receiving 502 errors through the load balancer. Stackdriver logs for the load balancer list the error as failed_to_connect_to_backend
.
Monitoring the health of backend-service shows the backend being terminated go from HEALTHY
to UNHEALTHY
and then disappearing while the other two backends remain HEALTHY
.
After a few seconds requests begin to succeed again.
I'm quite confused why the load balancer is unable to direct traffic to the healthy nodes while one goes down - or maybe this is a kubernetes issue? Could the load balancer be properly routing traffic to a healthy instance, but the kubernetes NodePort
service on that instance proxies the request back to the unhealthy instance for some reason?