2
votes

I'm fairly new to K8s but not so new that I haven't got a couple of running stacks and even a production site :)

I've noticed in a new deployment the ingress as below:

Type: Ingress
Load balancer: External HTTP(S) LB

Is reporting All backend services are in UNHEALTHY state which is odd since the service is working and traffic is/has been being served from it for a week.

Now, on closer inspection Backend services: k8s-be-32460--etcetc is what's unhappy. So using the GUI I click that...

Then I start to see the frontend with a funnel for ASIA, Europe, & America. Which seems to be funneling all traffic to Europe. Presumably, this is normal for the distributed external load balancer service (as per the docs) and my cluster resides in Europe. Cool. Except...

k8s-ig--etcetc   europe-west1-b   1 of 3 instances healthy

1 out of 3 instances you say? eh? And this is about as far as I've got so far. Can anyone shed any light?

Edit:

Ok, so one of the nodes reporting as unhealthy was in fact a node from the default-node-pool. I have now scaled back to 0 nodes since as far I'm aware the preference is to manage them explicitly. Leaving us with just 2 nodes. 1 of which is un-healthy according to the ingress, despite both being in the same zone.

Digging even further somehow it is reporting in the GUI that only one of the instance group instances is healthy. Yet these instances are auto-created by GCP I don't manage them.

Any ideas?

Edit 2:

I followed this right the way through SSH to each of the VM's in the instance group and executing the health check on each node. One does indeed fail.

Just a simple curl localhost:32460 one routes & the other doesn't. Though there is something listening on 32460 as shown here

tcp6       0      0 :::32460                :::*       LISTEN      - 

The healthcheck is HTTP / 32460

Any ideas why a single node will have stopped working. As I say, I'm not savvy with how this underlying VM has been configured.

Wondering now whether it's just some sort of straightforward routing issue but it's extremely convoluted at this point.

1
Can you provide the YAML for the service? Did you set externalTrafficPolicy: local?Patrick W
@PatrickW yes i set the externalTrafficPolicy: local needed to expose the actual client IP's to the app.David
@PatrickW do you want the web service or the ingress YAML?David
I also verified that each web pod is running on a different node (I.E. one on each node) as it did occur to me there is nothing necessarily stopping it from running both pods on a single node. However, it is correctly running 1 pod on each node.David
please share the deployment/service/ingress yaml of the failing backend app as for the troubleshooting steps you provide show that the other one is working fine.Will R.O.F.

1 Answers

0
votes

This works for me:

In my case, I was exposing an API, this API hasn't a default route, that is to say, if I type myIp, It's was returning a 404 error (not found). So, I made a test and I put a "default" route on my Startup.cs, like this:

app.UseEndpoints(endpoints =>
            {
                endpoints.MapGet("/", async context =>
                {
                    await context.Response.WriteAsync("Hola mundillo");
                });
                endpoints.MapControllers();
            });

Then, the status passed from unhealthy to Ok. Maybe that isn't a definitive solution, but maybe, It can help someone to find the error.