Optimal gunicorn-worker configuration (number and class) for Python REST APIs

Question

Lets say I have two conceptually different REST APIs developed in Python through a framework like Flask or Falcon which I need to deploy through Gunicorn on a server with 4GB of RAM and 2 vCPUs.

API #1: CPU-bound

The requests to this API involve little to no IO but are rather CPU-bound. Nonetheless, the operations are very quick and require little memory, e.g., simple mathematical operations.

API #2: IO-bound

The requests to this API involve a series of HTTP requests, e.g., to another API or fetching pages through GET requests. Thus, the majority of the 'work' involves waiting for other requests to resolve.

My question is: What would the optimal Gunicorn worker configurations (worker numbers and classes) be so that one would get optimal performance (preferably in terms of concurrency and requests-per-second) out of these APIs deployed on the aforementioned servers?

Reflexively I'd opt for a number of gevent-class workers but I've been scouring docs to verify said decision to no avail.

Any input would be appreciated :)

lesingerouge lesingerouge · Accepted Answer · 2016-05-01T21:56:13

Basically you need two different things: parallelism and async.

The way Gunicorn handles requests is by allowing each worker to process one request. As such there is no "buffer" in front of the application to handle overflow and there is no solution to a possible "thundering herd" problem (see here).

You will need to run 2 different gunicorn instances, each running one of the API's.

Ideally, you should have a ballpark estimation of your possible load for each API, because in your case parallelism is very limited (2 vcores are not much really) and as such, CPU will be a bottleneck for every worker.

Given the gunicorn documentation recommendations (2* nr of cores + 1) I would try to start from here, with the base assumption that it might overload the server:

#for API1
workers = 4
worker_class = sync
threads = 2

#for API2
workers = 10
worker_class = gevent

You will have to twist and tweak these values based on your server load, IO traffic and memory availability. You should test load response with a script designed to mock a flurry of simultaneous requests to both API's (you can use grequests for that).

Optimal gunicorn-worker configuration (number and class) for Python REST APIs

API #1: CPU-bound

API #2: IO-bound

1 Answers