1
votes

I have a GKE cluster with 4 nodes in an instance group. I deployed Ingress and several pods (1 replica only of each pod so they are only on 1 node). I notice on the Google Console (Ingress details page) that all backend services remain Unhealhy although the healthchecks on the running pods are OK and my application is running. To my understanding it says it is unhealthy because out of the 4 nodes, only 1 node is running an instance of a given pod (on the Back-end service details it says "1 of 4 instances healthy"). Am I correct and should I worry and try to fix this? It's bit strange to accept an Unhealthy status when the application is running...

Edit: After further investigation, down to 2 nodes, and activating the healthcheck logs, I can see that the backend service status seems to be the status of the last executed healthcheck. So if it checks last the node that hosts the pod, it is healthy, else it is unhealthy.

GKE version: 1.16.13-gke.1

My ingress definition:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    ingress.gcp.kubernetes.io/pre-shared-cert: mcrt-dc729887-5c67-4388-9327-e4f76baf9eaf
    ingress.kubernetes.io/backends: '{"k8s-be-30301--503461913abc33d7":"UNHEALTHY","k8s-be-31206--503461913abc33d7":"HEALTHY","k8s-be-31253--503461913abc33d7":"HEALTHY","k8s-be-31267--503461913abc33d7":"HEALTHY","k8s-be-31432--503461913abc33d7":"UNHEALTHY","k8s-be-32238--503461913abc33d7":"HEALTHY","k8s-be-32577--503461913abc33d7":"UNHEALTHY","k8s-be-32601--503461913abc33d7":"UNHEALTHY"}'
    ingress.kubernetes.io/https-forwarding-rule: k8s2-fs-sfdowd2x-city-foobar-cloud-8cfrc00p
    ingress.kubernetes.io/https-target-proxy: k8s2-ts-sfdowd2x-city-foobar-cloud-8cfrc00p
    ingress.kubernetes.io/ssl-cert: mcrt-dc729887-5c67-4388-9327-e4f76baf9eaf
    ingress.kubernetes.io/url-map: k8s2-um-sfdowd2x-city-foobar-cloud-8cfrc00p
    kubernetes.io/ingress.allow-http: "false"
    kubernetes.io/ingress.global-static-ip-name: city
    networking.gke.io/managed-certificates: foobar-cloud
  creationTimestamp: "2020-08-06T08:25:18Z"
  finalizers:
  - networking.gke.io/ingress-finalizer-V2
  generation: 1
  labels:
    app.kubernetes.io/instance: foobar-cloud
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: foobar-cloud
    helm.sh/chart: foobar-cloud-0.4.58
  name: foobar-cloud
  namespace: city
  resourceVersion: "37878"
  selfLink: /apis/extensions/v1beta1/namespaces/city/ingresses/foobar-cloud
  uid: 751f78cf-2344-46e3-b87e-04d6d903acd5
spec:
  rules:
  - http:
      paths:
      - backend:
          serviceName: foobar-cloud-server
          servicePort: 9999
        path: /foobar/server
      - backend:
          serviceName: foobar-cloud-server
          servicePort: 9999
        path: /foobar/server/*
status:
  loadBalancer:
    ingress:
    - ip: xx.xx.xx.xx
3
Could you share your Ingress definition ?mario
I've edited my question with the Ingress definition and further investigation.Alain B.

3 Answers

2
votes

I had a very similar issue. I don't need to share my setup as it's almost identical to the OP's. I'm using the GKE Ingress Controller also like the OP. I had manually added externalTrafficPolicy: Local to the service called by the Ingress Controller backend service and when I changed the externalTrafficPolicy from 'Local' to 'Cluster' (as per dany L above) the Ingress backend service immediately reported healthy.

I removed the 'externalTrafficPolicy:' lines from the called Service and am now set up with a GKE Ingress Controller using conatainer native load balancing with all backend services reporting healthy.

0
votes

I finally found out the cause of this.
My services were not mentionning any value for externalTrafficPolicy so the default value of Cluster applied.
However, I have a NetworkPolicy defined which goal was to prevent traffic from other namespaces,as described here. I added the IPs of the load balancers probes as stated in this doc but was missing the allow connections from other node IPs in the cluster.

-1
votes

Please check your yaml file for your service. If it shows externalTrafficPolicy: local, then it is expected behavior.

Local means traffic will always go to a pod on the same node, while everything else is dropped. So if your deployment has only 1 replica it is serving, you will only have one healthy instance.

You can easily test that theory, scale up to 2 replicas and observe behavior. I forsee 1 healthy instance if 2nd replica lands on the same node as first replica and 2/4 healthy if 2nd replica lands on a different node. Let me know.