Lets say I have two conceptually different REST APIs developed in Python through a framework like Flask or Falcon which I need to deploy through Gunicorn on a server with 4GB of RAM and 2 vCPUs.
API #1: CPU-bound
The requests to this API involve little to no IO but are rather CPU-bound. Nonetheless, the operations are very quick and require little memory, e.g., simple mathematical operations.
API #2: IO-bound
The requests to this API involve a series of HTTP requests, e.g., to another API or fetching pages through GET requests. Thus, the majority of the 'work' involves waiting for other requests to resolve.
My question is: What would the optimal Gunicorn worker configurations (worker numbers and classes) be so that one would get optimal performance (preferably in terms of concurrency and requests-per-second) out of these APIs deployed on the aforementioned servers?
Reflexively I'd opt for a number of gevent-class workers but I've been scouring docs to verify said decision to no avail.
Any input would be appreciated :)