23
votes

My Flask application will receive a request, do some processing, and then make a request to a slow external endpoint that takes 5 seconds to respond. It looks like running Gunicorn with Gevent will allow it to handle many of these slow requests at the same time. How can I modify the example below so that the view is non-blocking?

import requests

@app.route('/do', methods = ['POST'])
def do():
    result = requests.get('slow api')
    return result.content
gunicorn server:app -k gevent -w 4
3
What do you expect would happen here? You can't return anything to the client if you haven't received it yetWayne Werner
I was expecting to make it async so when it's waiting for the super slow api the cpu power can be used to handle other incoming requests that can potentially be going to the other path. (Since I assume this application will receive lots of other different incoming requests)JLTChiu
That doesn't mean what you think it means. And Gunicorn should be handling this for you, you could test to make sure just by adding a time.sleep(30) in there, I think. It's called the reactor pattern, but Gunicorn allows the client to connect, and then passes off the request to a worker. When the worker finishes, it returns the data from the worker and then puts it back in the pool. I'm not sure if it spins up a new worker if all the existing ones are busy, though.Wayne Werner
I am still learning this, but I expect running Gunicorn should be something like gunicorn server:app -k gevent -w 4 but I am really not sure.JLTChiu
@WayneWerner, do you mean that with the current code I posted above, when a specific request is waiting for the slow api to response, it will just use the cpu power to process other incoming requests to the application server?JLTChiu

3 Answers

14
votes

If you're deploying your Flask application with gunicorn, it is already non-blocking. If a client is waiting on a response from one of your views, another client can make a request to the same view without a problem. There will be multiple workers to process multiple requests concurrently. No need to change your code for this to work. This also goes for pretty much every Flask deployment option.

6
votes

First a bit of background, A blocking socket is the default kind of socket, once you start reading your app or thread does not regain control until data is actually read, or you are disconnected. This is how python-requests, operates by default. There is a spin off called grequests which provides non blocking reads.

The major mechanical difference is that send, recv, connect and accept can return without having done anything. You have (of course) a number of choices. You can check return code and error codes and generally drive yourself crazy. If you don’t believe me, try it sometime

Source: https://docs.python.org/2/howto/sockets.html

It also goes on to say:

There’s no question that the fastest sockets code uses non-blocking sockets and select to multiplex them. You can put together something that will saturate a LAN connection without putting any strain on the CPU. The trouble is that an app written this way can’t do much of anything else - it needs to be ready to shuffle bytes around at all times.

Assuming that your app is actually supposed to do something more than that, threading is the optimal solution

But do you want to add a whole lot of complexity to your view by having it spawn it's own threads. Particularly when gunicorn as async workers?

The asynchronous workers available are based on Greenlets (via Eventlet and Gevent). Greenlets are an implementation of cooperative multi-threading for Python. In general, an application should be able to make use of these worker classes with no changes.

and

Some examples of behavior requiring asynchronous workers: Applications making long blocking calls (Ie, external web services)

So to cut a long story short, don't change anything! Just let it be. If you are making any changes at all, let it be to introduce caching. Consider using Cache-control an extension recommended by python-requests developers.

1
votes

You can use grequests. It allows other greenlets to run while the request is made. It is compatible with the requests library and returns a requests.Response object. The usage is as follows:

import grequests

@app.route('/do', methods = ['POST'])
def do():
    result = grequests.map([grequests.get('slow api')])
    return result[0].content

Edit: I've added a test and saw that the time didn't improve with grequests since gunicorn's gevent worker already performs monkey-patching when it is initialized: https://github.com/benoitc/gunicorn/blob/master/gunicorn/workers/ggevent.py#L65