0
votes

I have an EKS cluster running kubernetes 1.14. I deployed the Nginx controller on the cluster following these steps from the following link.

Here are the steps that I followed -

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/mandatory.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/aws/service-l4.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/static/provider/aws/patch-configmap-l4.yaml

But I keep getting these errors intermittently in the ingress controller.

2019/10/15 15:21:25 [error] 40#40: *243746 upstream timed out (110: Connection timed out) while connecting to upstream, client: 63.xxx.xx.xx, server: x.y.com, request: "HEAD / HTTP/1.1", upstream: "http://172.20.166.58:80/", host: "x.y.com"

And sometimes these -

{"log":"2019/10/15 02:58:40 [error] 119#119: *2985 connect() failed (113: No route to host) while connecting to upstream, client: xx.1xx.81.1xx, server: a.b.com , request: \"OPTIONS /api/v1/xxxx/xxxx/xxx HTTP/2.0\", upstream: \"http://172.20.195.137:9050/api/xxx/xxx/xxxx/xxx\ ", host: \"a.b.com \", referrer: \"https://x.y.com/app/connections\"\n","stream":"stderr","time":"2019-10-15T02:58:40.565930449Z "}

I am using the native Amazon VPC CNI plugin for Kubernetes for networking -

amazon-k8s-cni:v1.5.4

I noticed that a couple of replicas out of the 5 replicas of the nginx ingress controller pod were not able to talk to the backend application. To check the connectivity between the nginx ingress controller pods and the backend applications I sshed into the nginx ingress controller pod and tried to curl the backend service and it timed out, but when I ssh into another backend service and then curl the same backend service it returns a 200 status code. The way I temporarily fixed it was by deleting the replicas that were not able to talk to the backend and recreated it. This temporarily fixed the issue but after a few hours the same errors start showing up again.

1

1 Answers

2
votes
amazon-k8s-cni:v1.5.4

Has known issues with DNS and pod to pod communication. It's recommended to revert back to

amazon-k8s-cni:v1.5.3

v1.5.4 Release Notes

I had the same issues you're seeing and going back to v1.5.3 seemed to resolve it for me. I think they recently reverted the plugin back to v1.5.3 for when an eks cluster is launched anyways.