How to let a single Cloud Run instance (with Gunicorn sync workers) handle multiple POST requests from Pub/Sub?

Question

Scenario

I'm trying to deploy an application written in Python that parses e-mail files to images on Google Cloud Run with Flask and gunicorn. The Cloud Run instance is triggered by POST requests from a Pub/Sub topic. I'm trying to get my Cloud Run instances to handle multiple requests by using multiple sync workers with gunicorn, but I cannot seem to achieve this. I've been browsing SO and google for the last few hours and I just cannot figure it out. I feel like I'm missing something extremely simple.

This is what my pipeline should look like:

New e-mail is placed in the Storage bucket.
Storage sends a notification to Pub/Sub topic.
Pub/Sub performs a POST request to HTTPS endpoint of Cloud Run instance.
Cloud Run processes e-mail to images and saves results in another Storage bucket.

Setup

I have configured the Cloud Run service with --concurrency=3 and --max-instances=10. Pub/Sub uses an --ack-deadline of 10 minutes (600 seconds). This is how my Cloud Run instances are started:

CMD exec gunicorn --bind :$PORT main:app --workers 3 --timeout 0

My (simplified) main.py:

import os
import json
import base64
import traceback
from flask import Flask, request
from flask_cors import CORS
from flask_sslify import SSLify

from src.utils.data_utils import images_from_email

app = Flask(__name__)
CORS(app, supports_credentials=True)
sslify = SSLify(app)

@app.route("/", methods=['GET', 'POST'])
def preprocess_emails():
    envelope = request.get_json()
    data = json.loads(base64.b64decode(pubsub_message["data"]).decode())

    try:
        # function that processes email referenced in pubsub message to images
        fn, num_files, img_bucket, processed_eml_bucket = images_from_email(data)
    
        # here I do some logging
        return "", 204

    except Exception as e:
        traceback.print_exception(type(e), e, e.__traceback__)
        return "", 500

    return "", 500

if __name__ == "__main__":
    app.run(ssl_context="adhoc", host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))

Problem

With the above setup, I would expect that my Cloud Run service can handle 30 requests at a time. Each instance should be able to handle 3 requests (1 for each gunicorn worker) and a maximum of 10 instances can spawn. What actually happens is the following:

As soon as I throw 15 new emails in my storage bucket, 15 POST requests are made to my Cloud Run endpoint via Pub/Sub. But instead of spawning 5 Cloud Run instances each handling 3 requests, Cloud Run instantly tries to spawn an instance (with 3 workers) for every (!) request, where it seems each of the 3 workers is processing the same request. This eventually results in HTTP 429 errors "The request was aborted because there was no available instance". I'm also noticing in the logs that some of the email files are being processed by multiple Cloud Run instances at the same time. What am I doing wrong? Does this have something to do with having to enable multiprocessing in my Python code somehow or is this gunicorn/Cloud Run/PubSub related?

The lifetime of a Google Cloud Run instance begins with an HTTP Request, in your case POST, and ends when the request returns a response. There is no background processing supported. This document will help you understand what Cloud Run offers: cloud.google.com/run/docs/reference/container-contract — John Hanley
If you send only one email, does it work as expected? In addition, can you try to increase the memory (set 2Gb to be sure) and see if the behavior is the same. — guillaume blaquiere
The Cloud Run design is from one-to-N requests per instance. You do not need to do anything to receive N requests other than set the correct command line option for concurrency and not overload the instance. In the simplest view, do nothing. Your instance will receive multiple requests if set to do so (default is 80 requests per instance). — John Hanley
No you are not on the right path creating worker threads. Code a function that handles a request and returns a response. That is all that is required for your example. Worker threads will just waste resources and be terminated as there are no background processing between requests. There are exceptions such as overlapped requests but you are far from needing advanced techniques. Remember, there is a Global Front End (GFE) in front of Cloud Run that handles proxying, routing requests, etc. Your code does not make those decisions. — John Hanley
You can do anything you want, as long a Cloud Run supports it, between the HTTP Request and your HTTP Response. I did not analyze your code, but the concept is fine. Just forget about trying to create a complex system with workers/threads etc. Note: I am not sure what "email to image" means. Do not answer. Delete this question and start over. The comment thread is so long few will read it. — John Hanley

Unknown Unknown · Accepted Answer · 2021-02-26T21:56:39

The lifetime of a Google Cloud Run instance begins with an HTTP Request, in your case POST, and ends when the request returns a response. There is no background processing supported.

This is why there is no point to that design. Cloud Run is an HTTP Request/Response system. The GFE(Goblal FrontEnd) sitting in front of Cloud Run determines which instance receives requests (current instance or another instance).

The Cloud Run design is from one-to-N requests per instance. You do not need to do anything to receive N requests other than set the correct command line option for concurrency and not overload the instance. In the simplest view, do nothing. Your instance will receive multiple requests if set to do so (default is 80 requests per instance).

Code a function that handles a request and returns a response. That is all that is required for your example. Worker threads will just waste resources and be terminated as there are no background processing between requests.

Remember, there is a Global Front End (GFE) in front of Cloud Run that handles proxying, routing requests, etc. Your code does not make those decisions.

How to let a single Cloud Run instance (with Gunicorn sync workers) handle multiple POST requests from Pub/Sub?

Scenario

Setup

Problem

1 Answers