2
votes

I'm running into an issue:

Getting a health check to succeed for a .Net app running in an IIS Container when trying to use Container Native Load Balancing(CNLB).

I have a Network Endpoint Group(NEG) created by an Ingress resource definition in GKE with a VPC Native Cluster.

When I circumvent CNLB by either exposing the NodePort or making a service of type LoadBalancer, the site resolves without issue.

All the pod conditions from a describe look good: pod readiness

The network endpoints show up when running describe endpoints: ready addresses

This is the health check that is generated by the load balancer: GCP Health Check

When hitting these endpoints from other containers or VMs in the same VPC, /health.htm responds with a 200. Here's from a container in the same namespace, though I have reproduced this with a Linux VM, not in the cluster but in the same VPC: endpoint responds

But in spite of it all, the health check is reporting the pods in my NEG unhealthy: Unhealthy Endpoints

The stackdriver logs confirm the requests are timing out but I'm not sure why when the endpoints are responding to other instances but not the LB: Stackdriver Health Check Log

And I confirmed that GKE created what looks like the correct firewall rule that should allow traffic from the LB to the pods: firewall

Here is the YAML I'm working with:

Deployment:

apiVersion: apps/v1                                                  
kind: Deployment                                                     
metadata:                                                            
  labels:                                                            
    app: subdomain.domain.tld                                       
  name: subdomain-domain-tld                                       
  namespace: subdomain-domain-tld
spec:                                                                
  replicas: 3                                                        
  selector:                                                          
    matchLabels:                                                     
      app: subdomain.domain.tld                                     
  template:                                                          
    metadata:                                                        
      labels:                                                        
        app: subdomain.domain.tld
    spec:                                                            
      containers:                                                    
      - image: gcr.io/ourrepo/ourimage
        name: subdomain-domain-tld
        ports:                                                       
        - containerPort: 80                                          
        readinessProbe:                                              
          httpGet:                                                   
            path: /health.htm                                        
            port: 80                                                 
          initialDelaySeconds: 60                                    
          periodSeconds: 60                                          
          timeoutSeconds: 10                                         
        volumeMounts:                                                
        - mountPath: C:\some-secrets                                      
          name: some-secrets
      nodeSelector:                                                  
        kubernetes.io/os: windows                                    
      volumes:                                                       
      - name: some-secrets                                    
        secret:                                                      
          secretName: some-secrets

Service:

apiVersion: v1                                                       
kind: Service                                                        
metadata:                                                            
  labels:                                                            
    app: subdomain.domain.tld                                     
  name: subdomain-domain-tld-service
  namespace: subdomain-domain-tld
spec:                                                                
  ports:                                                             
  - port: 80                                                         
    targetPort: 80                                                   
  selector:                                                          
    app: subdomain.domain.tld                                       
  type: NodePort                 

Ingress is extremely basic as we have no real need for multiple routes on this site, however, I'm suspecting whatever issues we're having are here.

apiVersion: extensions/v1beta1                                       
kind: Ingress                                                        
metadata:                                                            
  annotations:                                                       
    kubernetes.io/ingress.class: gce
  labels:                                                            
    app: subdomain.domain.tld                                       
  name: subdomain-domain-tld-ingress
  namespace: subdomain-domain-tld
spec:                                                                
  backend:                                                           
    serviceName: subdomain-domain-tld-service
    servicePort: 80

Last somewhat relevant detail is I tried the steps present in this documentation and it worked but it's not identical to my situation as its not using Windows Containers nor Readiness Probes: https://cloud.google.com/kubernetes-engine/docs/how-to/container-native-load-balancing#using-pod-readiness-feedback

Any suggestions would be greatly appreciated. I've spent two days stuck on this and I'm sure it's obvious but I just can't see the problem.

2
if it possible to switch to linux container ? If so, we can give you solution - Abdennour TOUMI
Are you allowing ingress/egress everywhere? All firewalls and Kubernetes network policies? Also allowing both on the cluster and to/from the load-balancer? - Rico
Unfortunately I can't switch to a linux container as the app we're running is asp.net rather than .net core and we're unable to port it to .net core @AbdennourTOUMI - 210rain
@Rico Yes, the cluster its on is used purely for looking into the feasibility of running our asp.net sites in GKE so I haven't configured any network policies. I've allowed all traffic on all ports to any instance in my VPC from 35.191.0.0/16 and 130.211.0.0/22 which are the IP ranges Google Load Balancers send traffic from per the documentation on this page: cloud.google.com/load-balancing/docs/health-checks I can also confirm there are no other firewall rules that would be taking over priority and denying the traffic. - 210rain
Must be some firewall rule somewhere. You can always check with GKE support. - Rico

2 Answers

1
votes

Apparently it's not documented but this functionality doesn't work with Windows containers at the time of writing. I was able to get in touch with a GCP Engineer and they provided the following:

After further investigation, I have found that Windows containers using LoadBalancer service works but, Windows containers using Ingress with NEGS is a limitation so, I have opened an internal case for updating the public documentation [1].

Since, Ingress + NEG will not work (per the limitation), I suggest you to use any option you mentioned either exposing the NodePort or making a service of type LoadBalancer.

0
votes

When you create an Ingress, the generated HC probes will default to performing HealthCheck on the same serving port and Path as the app. in this case, port 80 on Path /

Seems like your app report it's healthCheck on port 80 but on the /health.htm path.

You will need to add a custom healthCheck via the BackendConfig CRD. Have a look at this link [1]. You can find in the same Page how to associate the BackendConfig to the Ingress

What version of GKE are you on? Seems like an old version judging from the Ingress API you use.

[1]https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#direct_health