0
votes

I am setting up a new service on Google Cloud with a "Network Load Balancer". I am only using one backend server during setup. After a period of inactivity the first request to the load balancer address doesn't appear to be delivered to the back end server.

  • The load balancer reports the backend server healthy.
  • I see health check requests coming into the server ever 5sec on the backend server access log.
  • If I make a request directly to the backend server's health check URL I get a successful response.
  • If I make a request do the health check URL (or any other URL) through the load balancer the browser hangs indefinately and it does not appear that any request is ever made to the backend server.
  • If I then refresh the browser the page loads fine and further requests continue to load fine until some period of inactivity (hours?) has passed.

This doesn't seem like it would be a problem in production, but any unexplained issue like this makes me uncomfortable using the GCE load balancer for a live site.

This question is half help request and half bug report. Is this a know issue or am I doing something wrong?

Thanks for any help.

UPDATE

Looking back at the server logs I see that the hung request does show up in the access log, but not until I refresh the browser and it is showing a 0 sec response time.

Here are the two requests. The first is the hung request. The second is the refresh.

173.51.253.242 - devuser [09/May/2015:19:15:27 +0000] "GET /api/ HTTP/1.1" 499 0 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36" "-"
173.51.253.242 - devuser [09/May/2015:19:15:27 +0000] "GET /api/ HTTP/1.1" 200 5 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36" "-"

Notice both requests come in in the same second (when I refresh), the failed request has response code 499 and response time 0. 499 means the connection was closed by the client so I take these to mean that the request was not initiated by the loadbalancer until I hit refresh at which time it made both connections to the backend server.

1
If you only have one server what are you balancing?kpie
I will add additional servers once the system is going live. This system is currently in development.cmorris
What does the code for the two health checks look like?kpie
The health checks just return status 200, but this issue occurs with any URL. I'm just using the health check to test as it is the simplest test.cmorris
Are you maybe hitting the 10min timeout for idle connections?mensi

1 Answers

0
votes

Updating this with an actual answer as mensi's comment above seems to have gotten to the root of the problem.

Adding the TCP keepalive to the balanced hosts solved the problem.