0
votes

Folks, I have a bunch of services running on AWS ECS. My kubernetes cluster is on AWS too, using EKS. I use nginx-ingress to expose my cluster to my ECS services.

One of my nodejs container fails to initiate websocket connect request with my backend pod. The nodejs container's log just says fails to establish websocket connection.

From the log of my backend pod, it looks like the request is never made it to the backend.

Then I took a look at my nginx-ingress pod's log, I do see a bunch of 404 errors like these:

...
{
  "time": "2019-03-28T19:39:19+00:00",
  "request_body": "-",
  "remote_addr": "",
  "x-forward-for": "1.2.3.4(public ip), 127.0.0.1",
  "request_id": "ea1f269ce703a69126d22bea28b75b89",
  "remote_user": "-",
  "bytes_sent": 308,
  "request_time": 0,
  "status": 404,
  "vhost": "abc.net",
  "request_query": "-",
  "request_length": 1084,
  "duration": 0,
  "request": "GET /wsconnect HTTP/1.1",
  "http_referrer": "-",
  "http_user_agent": "Jetty/9.4.12.v20180830",
  "header-X-Destination": "-",
  "header-Host": "abc.net",
  "header-Connection": "upgrade",
  "proxy_upstream_name": "-",
  "upstream_addr": "-",
  "service_port": "",
  "service_name": ""
}
2019/03/28 19:39:19 [info] 82#82: *13483 client 192.168.233.71 closed keepalive connection
2019/03/28 19:39:23 [info] 79#79: *13585 client closed connection while waiting for request, client: 192.168.105.223, server: 0.0.0.0:80
2019/03/28 19:39:25 [info] 84#84: *13634 client closed connection while waiting for request, client: 192.168.174.208, server: 0.0.0.0:80
2019/03/28 19:39:25 [info] 78#78: *13638 client closed connection while waiting for request, client: 192.168.233.71, server: 0.0.0.0:80
2019/03/28 19:39:33 [info] 80#80: *13832 client closed connection while waiting for request, client: 192.168.105.223, server: 0.0.0.0:80
2019/03/28 19:39:35 [info] 83#83: *13881 client closed connection while waiting for request, client: 192.168.174.208, server: 0.0.0.0:80
2019/03/28 19:39:35 [info] 83#83: *13882 client closed connection while waiting for request, client: 192.168.233.71, server: 0.0.0.0:80
2019/03/28 19:39:36 [info] 84#84: *12413 client 127.0.0.1 closed keepalive connection
...

My question is: how can I dig further to see what exactly caused this websocket connection request failed? I tried setting error log level to debug but that produced way to much garbage.

The security group is fine. One of my container service CAN communicate with my backend pods in K8s cluster. That service is HTTP-based though.

My Ingress is set up following this guide: https://kubernetes.github.io/ingress-nginx/deploy/#aws I deployed the ingress controller as is.

The Service, Ingress, ConfigMap are as follow:

kind: Service
apiVersion: v1
metadata:
  name: ingress-nginx
  namespace: default
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  annotations:
    # Enable PROXY protocol
    service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
    service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "3600" # recommended for websocket
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "cert-arn"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"

spec:
  type: LoadBalancer
  selector:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  ports:
    - name: https
      port: 443
      protocol: TCP
      targetPort: http

---

kind: ConfigMap
apiVersion: v1
metadata:
  name: nginx-configuration
  namespace: default
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
data:
  enable-access-log-for-default-backend: "true"
  error-log-level: "info"
  allow-backend-server-header: "true"
  use-proxy-protocol: "true"
  log-format-upstream: '{"time": "$time_iso8601", "request_body": "$request_body", "remote_addr": "$proxy_protocol_addr","x-forward-for": "$proxy_add_x_forwarded_for", "request_id": "$req_id", "remote_user":"$remote_user", "bytes_sent": $bytes_sent, "request_time": $request_time, "status":$status, "vhost": "$host", "request_query": "$args", "request_length": $request_length, "duration": $request_time, "request" : "$request", "http_referrer": "$http_referer", "http_user_agent":"$http_user_agent", "header-X-Destination": "$http_X_Destination", "header-Host" : "$http_Host", "header-Connection": "$http_Connection","proxy_upstream_name":"$proxy_upstream_name", "upstream_addr":"$upstream_addr", "service_port" : "$service_port", "service_name":"$service_name" }'

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress-{{UUID}}
  namespace: {{NAMESPACE}}
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/enable-access-log: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
  labels:
    company-id: {{UUID}}
    company-name: {{ABC}}
spec:
  rules:
  - host: "{{UUID}}.k8s.dev.abc.net"
    http:
      paths:
      - path: /
        backend:
          serviceName: {{UUID}}
          servicePort: 443
1

1 Answers

2
votes

It turns out to be my application problem. The issue here is that the Host header(vhost) ended up being to one of my ECS service's FQDN, which is not recognized by my k8s cluster.

To resolve this issue, I ended up modifying my ECS service's application code to rewrite the X-Forwarded-Host header with "k8s-backend-url.com:443" and then nginx-ingress let the request through.