I'am not developer and it seems not simple task, but for your considerations please follow bests practices for Better performance by optimizing Gunicorn config.
In addition in kubernetes there are different mechanisms in order to scale your deployment like HPA due to CPU utilization and (How is Python scaling with Gunicorn and Kubernetes?)
You can use also Resource requests and limits of Pod and Container.
As per Gunicorn documentation
DO NOT scale the number of workers to the number of clients you expect to have. Gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second.
Gunicorn relies on the operating system to provide all of the load balancing when handling requests. Generally we recommend (2 x $num_cores) + 1 as the number of workers to start off with. While not overly scientific, the formula is based on the assumption that for a given core, one worker will be reading or writing from the socket while the other worker is processing a request.
#
update:
Depending on your approach you can choose different solution (deployment, daemonset) all above statements you can achieve in kubernetes by handling according Assigning CPU Resources to Containers and Pods
- Using deployment with resources (limits,requests) give you possibility to resize your app into multiple pods on a single node based on your hardware limits but depending on your "app load" it can not be good enough solution.
CPU requests and limits are associated with Containers, but it is useful to think of a Pod as having a CPU request and limit. The CPU request for a Pod is the sum of the CPU requests for all the Containers in the Pod. Likewise, the CPU limit for a Pod is the sum of the CPU limits for all the Containers in the Pod.
Note:
The CPU resource is measured in CPU units. One CPU, in Kubernetes, is equivalent to:
f.e. 1 GCP Core.
- As mentioned in the post the second approach (scaling your app into multiple nodes) it's also good choice. In this case you can cosnider using f.e. Statefulset or deployment in addition on GKE using "cluster austoscaler" you can achieve more extendable solution when you try to create new pods that don't have enough capacity to run inside the cluster. In this case cluster autoscaler automatically add additional resources.
On the other hand you can consider using different other solutions like Cerebral it gives you the possibility to create user-defined policies in order to increasing or decreasing the size of pools of nodes inside your cluster.
GKE's cluster autoscaler automatically resizes clusters based on the demands of the workloads you want to run. With autoscaling enabled, GKE automatically adds a new node to your cluster if you've created new Pods that don't have enough capacity to run; conversely, if a node in your cluster is underutilized and its Pods can be run on other nodes, GKE can delete the node.
Please keep in mind that the question is very general and there is no one good answer for this topic. You should consider all prons and cons based on your requirements, load, activity, capacity, costs ...
Hope this help.