Scenario
I'm trying to deploy an application written in Python that parses e-mail files to images on Google Cloud Run with Flask and gunicorn. The Cloud Run instance is triggered by POST requests from a Pub/Sub topic. I'm trying to get my Cloud Run instances to handle multiple requests by using multiple sync workers with gunicorn, but I cannot seem to achieve this. I've been browsing SO and google for the last few hours and I just cannot figure it out. I feel like I'm missing something extremely simple.
This is what my pipeline should look like:
- New e-mail is placed in the Storage bucket.
- Storage sends a notification to Pub/Sub topic.
- Pub/Sub performs a POST request to HTTPS endpoint of Cloud Run instance.
- Cloud Run processes e-mail to images and saves results in another Storage bucket.
Setup
I have configured the Cloud Run service with --concurrency=3
and --max-instances=10
. Pub/Sub uses an --ack-deadline
of 10 minutes (600 seconds). This is how my Cloud Run instances are started:
CMD exec gunicorn --bind :$PORT main:app --workers 3 --timeout 0
My (simplified) main.py
:
import os
import json
import base64
import traceback
from flask import Flask, request
from flask_cors import CORS
from flask_sslify import SSLify
from src.utils.data_utils import images_from_email
app = Flask(__name__)
CORS(app, supports_credentials=True)
sslify = SSLify(app)
@app.route("/", methods=['GET', 'POST'])
def preprocess_emails():
envelope = request.get_json()
data = json.loads(base64.b64decode(pubsub_message["data"]).decode())
try:
# function that processes email referenced in pubsub message to images
fn, num_files, img_bucket, processed_eml_bucket = images_from_email(data)
# here I do some logging
return "", 204
except Exception as e:
traceback.print_exception(type(e), e, e.__traceback__)
return "", 500
return "", 500
if __name__ == "__main__":
app.run(ssl_context="adhoc", host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))
Problem
With the above setup, I would expect that my Cloud Run service can handle 30 requests at a time. Each instance should be able to handle 3 requests (1 for each gunicorn worker) and a maximum of 10 instances can spawn. What actually happens is the following:
As soon as I throw 15 new emails in my storage bucket, 15 POST requests are made to my Cloud Run endpoint via Pub/Sub. But instead of spawning 5 Cloud Run instances each handling 3 requests, Cloud Run instantly tries to spawn an instance (with 3 workers) for every (!) request, where it seems each of the 3 workers is processing the same request. This eventually results in HTTP 429
errors "The request was aborted because there was no available instance". I'm also noticing in the logs that some of the email files are being processed by multiple Cloud Run instances at the same time. What am I doing wrong? Does this have something to do with having to enable multiprocessing in my Python code somehow or is this gunicorn/Cloud Run/PubSub related?