0
votes

What I wanna accomplish

I'm trying to connect an external HTTPS (L7) load balancer with an NGINX Ingress exposed as a zonal Network Endpoint Group (NEG). My Kubernetes cluster (in GKE) contains a couple of web application deployments that I've exposed as a ClusterIP service.

I know that the NGINX Ingress object can be directly exposed as a TCP load balancer. But, this is not what I want. Instead in my architecture, I want to load balance the HTTPS requests with an external HTTPS load balancer. I want this external load balancer to provide SSL/TLS termination and forward HTTP requests to my Ingress resource.

The ideal architecture would look like this:

HTTPS requests --> external HTTPS load balancer --> HTTP request --> NGINX Ingress zonal NEG --> appropriate web application

I'd like to add the zonal NEGs from the NGINX Ingress as the backends for the HTTPS load balancer. This is where things fall apart.

What I've done

NGINX Ingress config

I'm using the default NGINX Ingress config from the official kubernetes/ingress-nginx project. Specifically, this YAML file https://github.com/kubernetes/ingress-nginx/blob/master/deploy/static/provider/cloud/deploy.yaml. Note that, I've changed the NGINX-controller Service section as follows:

  • Added NEG annotation

  • Changed the Service type from LoadBalancer to ClusterIP.

# Source: ingress-nginx/templates/controller-service.yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    # added NEG annotation
    cloud.google.com/neg: '{"exposed_ports": {"80":{"name": "NGINX_NEG"}}}'
  labels:
    helm.sh/chart: ingress-nginx-3.30.0
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/version: 0.46.0
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: controller
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  type: ClusterIP
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: http
    - name: https
      port: 443
      protocol: TCP
      targetPort: https
  selector:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/component: controller
---

NGINX Ingress routing

I've tested the path based routing rules for the NGINX Ingress to my web applications independently. This works when the NGINX Ingress is configured with a TCP Load Balancer. I've set up my application Deployment and Service configs the usual way.

External HTTPS Load Balancer

I created an external HTTPS load balancer with the following settings:

What's not working

Soon after the external load balancer is set up, I can see that GCP creates a new endpoint under one of the zonal NEGs. But this shows as "Unhealthy". Requests to the external HTTPS load balancer return a 502 error.

  • I'm not sure where to start debugging this configuration in GCP logging. I've enabled logging for the health check but nothing shows up in the logs.

  • I configured the health check on the /healthz path of the NGINX Ingress controller. That didn't seem to work either.

Any tips on how to get this to work will be much appreciated. Thanks!

Edit 1: As requested, I ran the kubectl get svcneg -o yaml --namespace=<namespace>, here's the output

apiVersion: networking.gke.io/v1beta1
kind: ServiceNetworkEndpointGroup
metadata:
  creationTimestamp: "2021-05-07T19:04:01Z"
  finalizers:
  - networking.gke.io/neg-finalizer
  generation: 418
  labels:
    networking.gke.io/managed-by: neg-controller
    networking.gke.io/service-name: ingress-nginx-controller
    networking.gke.io/service-port: "80"
  name: NGINX_NEG
  namespace: ingress-nginx
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: false
    controller: true
    kind: Service
    name: ingress-nginx-controller
    uid: <unique ID>
  resourceVersion: "2922506"
  selfLink: /apis/networking.gke.io/v1beta1/namespaces/ingress-nginx/servicenetworkendpointgroups/NGINX_NEG
  uid: <unique ID>
spec: {}
status:
  conditions:
  - lastTransitionTime: "2021-05-07T19:04:08Z"
    message: ""
    reason: NegInitializationSuccessful
    status: "True"
    type: Initialized
  - lastTransitionTime: "2021-05-07T19:04:10Z"
    message: ""
    reason: NegSyncSuccessful
    status: "True"
    type: Synced
  lastSyncTime: "2021-05-10T15:02:06Z"
  networkEndpointGroups:
  - id: <id1>
    networkEndpointType: GCE_VM_IP_PORT
    selfLink: https://www.googleapis.com/compute/v1/projects/<project>/zones/us-central1-a/networkEndpointGroups/NGINX_NEG
  - id: <id2>
    networkEndpointType: GCE_VM_IP_PORT
    selfLink: https://www.googleapis.com/compute/v1/projects/<project>/zones/us-central1-b/networkEndpointGroups/NGINX_NEG
  - id: <id3>
    networkEndpointType: GCE_VM_IP_PORT
    selfLink: https://www.googleapis.com/compute/v1/projects/<project>/zones/us-central1-f/networkEndpointGroups/NGINX_NEG
1
Which version of GKE are you using? If you are using ` 1.18.6-gke.6400` or later, can you post the output of kubectl get svcneg NGINX_NEG -o yaml ?Gari Singh
Thanks for your response! I added the kubectl get svcneg... output to the original description.zerodark

1 Answers

0
votes

As per my understanding, your issue is - “when an external load balancer is set up, GCP creates a new endpoint under one of the zonal NEGs and it shows "Unhealthy" and requests to the external HTTPS load balancer which return a 502 error”.

Essentially, the Service's annotation, cloud.google.com/neg: '{"ingress": true}', enables container-native load balancing. After creating the Ingress, an HTTP(S) load balancer is created in the project, and NEGs are created in each zone in which the cluster runs. The endpoints in the NEG and the endpoints of the Service are kept in sync. Refer to the link [1].

New endpoints generally become reachable after attaching them to the load balancer, provided that they respond to health checks. You might encounter 502 errors or rejected connections if traffic cannot reach the endpoints.

One of your endpoints in zonal NEG is showing unhealthy so please confirm the status of other endpoints and how many endpoints are spread across the zones in the backend. If all backends are unhealthy, then your firewall, Ingress, or service might be misconfigured.

You can run following command to check if your endpoints are healthy or not and refer link [2] for the same - gcloud compute network-endpoint-groups list-network-endpoints NAME \ --zone=ZONE

To troubleshoot traffic that is not reaching the endpoints, verify that health check firewall rules allow incoming TCP traffic to your endpoints in the 130.211.0.0/22 and 35.191.0.0/16 ranges. But as you mentioned you have configured this rule correctly. Please refer link [3] for health check Configuration.

Run the Curl command against your LB IP to check for responses -
Curl [LB IP]

[1] https://cloud.google.com/kubernetes-engine/docs/concepts/ingress-xlb

[2] https://cloud.google.com/load-balancing/docs/negs/zonal-neg-concepts#troubleshooting

[3] https://cloud.google.com/kubernetes-engine/docs/concepts/ingress#health_checks