1
votes

I am trying to expose an mlflow model in a GKE cluster through an ingress-nginx and a google cloud load balancer.

The configuration of service to the respective deployment looks as follows:

apiVersion: v1
kind: Service
metadata:
  name: model-inference-service
  labels:
    app: inference
spec:
  ports:
  - port: 5555
    targetPort: 5555
  selector:
    app: inference

When forwarding this service to localhost using kubectl port-forward service/model-inference-service 5555:5555 I can successfully query the model by sending a test image to the api endpoint using the following script.

The url the request is sent to is http://127.0.0.1:5555/invocations. This works as intended so I assume the deployment running the pod exposing the model and the corresponding clusterIP service model-inference-service is configured correctly.

Next, I installed ingress-nxinx into the cluster by doing

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install my-release ingress-nginx/ingress-nginx

The ingress is configured as follows (I suspect the error has to be here?):

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
#    nginx.ingress.kubernetes.io/rewrite-target: /invocations
  name: inference-ingress
  namespace: default
  labels:
    app: inference
spec:
  rules:
    - http:
        paths:
          - path: /invocations
            backend:
              serviceName: model-inference-service
              servicePort: 5555

The ingress controller pod is running successfully:

my-release-ingress-nginx-controller-6758cc8f45-fwtw7   1/1     Running   0          3h33m

In the GCP console I can see that the load balancer was created successfully as well and I can optain its IP.

When using the same test script I used before to make a request to the Rest api endpoint (previously the service was forwarded to localhost) but now with the ip of the load balancer, I get a 502 Bad Gateway error:

The url is the following now: http://34.90.4.0:80/invocations

Traceback (most recent call last):
  File "test_inference.py", line 80, in <module>
    run()
  File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "//anaconda3/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "test_inference.py", line 76, in run
    print(score_model(data_path, host, port).text)
  File "test_inference.py", line 54, in score_model
    status_code=response.status_code, text=response.text
Exception: Status Code 502. <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.19.1</center>
</body>
</html>

When accessing the same url in a browser it says:

502 Bad Gateway
nginx/1.19.1

The logs of the ingress controller state:

2020/08/26 16:06:45 [warn] 86#86: *42282 a client request body is buffered to a temporary file /tmp/client-body/0000000009, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
2020/08/26 16:06:45 [error] 86#86: *42282 connect() failed (111: Connection refused) while connecting to upstream, client: 10.10.0.30, server: _, request: "POST /invocations HTTP/1.1", upstream: "http://10.52.3.7:5555/invocations", host: "34.90.4.0"
10.10.0.30 - - [26/Aug/2020:16:06:45 +0000] "POST /invocations HTTP/1.1" 502 157 "-" "python-requests/2.24.0" 86151 0.738 [default-model-inference-service-5555] [] 10.52.3.7:5555, 10.52.3.7:5555, 10.52.3.7:5555 0, 0, 0 0.000, 0.001, 0.000 502, 502, 502 0d86e360427c0a81c287da4ff5e907bc

To test if the ingress and the load balancer work in principle I replaced the docker image with the real rest api I want to expose with this docker image which returns "hello world" on port 5050 and path /. I changed the port and the path (from /invocations to /) in the service and ingress manifests shown above and could successfully see "hello world" when accessing the ip of the load balancer in the browser.

Does anyone see what I might have done wrong? Thank you very much!

Best regards,

F

1
Can you share logs from the invocation service pod? Is it getting any requests from the ingress?Faheem
The logs obtained using kubectl logs <pod id> unfortunately don't contain any information about requests being made to the server. Also not when making a successful request over the service/port forwarded from the cluster to localhost. Mlflow running in the pod uses nginx and gunicorn. I attached to the pod, found the nginx.conf but the file access_log /var/log/nginx/access.log does not exist on this pod. Unfortunately I don't know where logs are that could tell me whether the pod is getting any requests from ingress. Mlflow documentation is not giving the answer. What could I try?LaTeXian
Use mendhak/http-https-echo image and check what path and data are receiverd by server (you can add it to your question). Also try without rewrite-target annotation.Matt

1 Answers

0
votes

The configuration you have shared is looking fine. There must be something in your cluster environment that is causing this behavior. See if pod-to-pod communication is working. Launch a test pod on the same node as the Nginx ingress controller and do a curl from that pod to the target service. See if you get any DNS or Network issues. Try changing the host header when calling the service and see if it's sensitive to that.