A flask-api (using gunicorn) is used as an inference api of a deep learning model. This specific inference process is very cpu intensive (not using gpu yet).
What is the best practice of deploying it to a kubernetes cluster, based on these aspects:
should I create multiple pods handling requests using single gunicorn worker or less pods enabling gunicorn multiple workers? (node memory footprint)
since google provides to expose your deployment as a service using an external loadbalancer, do I need a nginx web server on my flask-gunicorn stack?
creating multiple identical pods on the same node, is it more memory intensive than handling all these requests using multithreading on a single pod?