10
votes

I am trying to set up a load balancer on GCE for 2 Tomcat servers, running individually on 2 VMs (vm-1 and vm-2). Both listen on port 80 and the network firewall rules allow traffic on port 80 from any source (0.0.0.0/0). Now, I created an instance group of both these VMs together called vm-group and set a named port named http pointing to port 80.

I created a health check too, on port 80, pointing to /<app_name>/<health_url>, which reports a HTTP 200 if the app is healthy.

Then, I setup a HTTP load balancer using instructions in this video. Once setup, I find that the load balancer reports that 0/2 instances are healthy, which means the health checks are failing.

When I manually hit the health check URLs, they return a HTTP 200 - so my app is healthy.

Now, I am not sure why the load balancer reports the VMs as unhealthy and is unable to route requests. How can I debug this further?

Edit: I verified that the google-address-manager is running as mentioned in this question.

5
Could you run gcloud compute http-health-checks describe healthcheck-name and paste the results at the end of your question?Grzenio
Just want to bold a point - response code should be exactly 200 for HTTP health checks. In our case service responding with 202 to health check probes and service marked as failed. However switching health check type to TCP make it passing.Tom Lime

5 Answers

9
votes

Have you added google's health checker to your firewall list: 130.211.0.0/22, 35.191.0.0/16

4
votes

I have the same problem. Additionally when using gcloud tool:

gcloud compute backend-services get-health mybackendservice

I get

- healthState: UNHEALTHY
    instance: https://www.googleapis.com/compute/v1/projects/myproject/.../instances/mycluster-4gim
    port: 8000

The problem is that in the health check defined for the backend service http is used (not https) and port number 80. Cannot find explanation for that discrepancy.

2
votes

Turns out that the port number is taken from the Instance Group setting (Port name mapping section). But the health check still does not work... Comparing outcome of the check with target pool containing same instances as instance group

gcloud compute target-pools get-health targetpoolname

---
healthStatus:
- healthState: HEALTHY
  instance: https://www.googleapis.com/compute/v1/projects/inst
  ipAddress: an.ip.addr.es
kind: compute#targetPoolInstanceHealth
---

...

2
votes

If your application/service is responding properly and the created health check is assured to be correct, there are few other things that should be checked, verified and fixed accordingly, for failing of load balancer checks on GCP.

  • Make sure GCP LB health checker subnets i.e 130.211.0.0/22 & 35.191.0.0/16 are allowed in the firewall rules policy attached to the concerned backend instances
  • Make sure the service defined as backend on LB is not only binded to the instance's IP address as in that case it will not be able to answer queries for the load balancer's external address. Let your service respond to any address by binding it to 0.0.0.0
  • Make sure the service google-address-manager is up and running on the concerned backend instances. The address manager's job is to configure the network settings for the instance, including settings for load-balanced IP addresses and adding appropriate routes to the instance routing table for communication with the load balancer.
1
votes

Make sure firewall rules are explicitly allowing tomcat listened port(s) both on the VM node and on the GCP firewall.

  1. To explicitly open VM node ports via iptables or firewall-cmd on Redhat/Centos based distros.

  2. To explicitly open GCP firewall ports, create ingress firewall rules with ports specified, pay attention to the rule assignments, in GCP, it's called "target" https://cloud.google.com/vpc/docs/firewalls#rule_assignment In my case, I forgot to assign the created firewall rules assignments, aka relate the target tags to the VM nodes, the firewall rules will not be in effect until they are assigned (associated the target tags with the target VM, see below). After fixing/associating the tags with the firewall rules, viola! everything works. <code>Network tags</code> in VM Instance detail page