What can cause a Cloud Run instance to not be reused despite continuous load?

Question

Context:

My Spring-Boot app runs as expected on Cloud Run when I deploy it with max-instances set to 1: It receives a constant stream of pubsub messages via push, and makes anywhere from 0 to 5 writes to an associated CloudSQL instance, depending on the message payload. Typically it handles between 20 and 40 messages per second. Latency/response-time varies between 50ms and 60sec, probably due to some resource contention.

In order to increase throughput/ decrease resource contention, I'm looking to experiment with the connection pool size per app-instance, as well as the concurrency and max-instances parameters for my cloud run app.

I understand that due to Spring-Boot, my app has a relatively high cold-start time of about 30-40 seconds. This is acceptable for how this service is used.

Problem:

I'm experiencing problems when deploying a spring-boot app to cloud run with max-instances set to a value greater than 1:

Instances start, handle a single request successfully, and then produce no more logs.
This happens a few times per minute, leading me to believe that instances get started (cold-start), handle a single request, die, and then get started again. They are not being reused as described in the docs, and as is happening when I set max-instances to 1. Official docs on concurrency
Instead, I expect 3 container instances to be started, which then each requests according to max-concurrency setting.

Billable container time at max-instances=3:

As shown in the graph, the number of instances is fluctuating wildly, once the new revision with max-instances=3 is deployed.
The graphs for CPU- and memory-usage also look like this.
There are no error logs. As before at max-instaces=1, there are warnings indicating that there are not enough instances available to handle requests (HTTP 429).
Connection Limit of CloudSQL instance has not been exceeded
Requests are handled at less than 10/s

Finally, this is the command used to deploy:

 gcloud beta run deploy my-service --project=[...] --image=[...] --add-cloudsql-instances=[...] --region=[...] --platform=managed --memory=1Gi --max-instances=3 --concurrency=3 --no-allow-unauthenticated

What could cause this behavior?

You are making very big incorrect assumptions about Cloud Run. Cloud Run is an HTTP Request/Response system with code as a container. Everything is a single request with a single response. Design your system that way. Also, when you talk about latency, show how you measure it. Just opening a TCP/IP connection over a long distance can take 100 ms. Do not use threading as that will not improve anything. Make an HTTP request to Cloud Run, receive an HTTP response. Nothing else. Do not rely on state, do not assume the same instance will run again on the next request. Do not attempt to multitask. — John Hanley
If those requirements do not meet your goals, then use a different compute service. — John Hanley
Hi John, according to the cloud run docs, container instances are reused to serve incoming requests, and can handle multiple requests concurrently. The latency numbers I provided are the ones measured by the cloud run dashboard itself. My application is stateless. Connectiosn are made from google pubsub to the cloud run instance, which are in the same network, so no distances here. I am not using threading, I had a typo in the question where I was talking about a "threadpool" when I really meant connectionpool. — Bastian Stein
A connection pool will not help you either. No background threads, connections, etc. The container lifetime exists only between request and response. — John Hanley
Connection pooling is a recommended practice from the cloud run docs. While using multiple threads is inefficient, when there's only a single CPU core available, how exactly does this prevent the container instance from being reused for multiple requests? It works for max-instances 1 after all - identical code and configuration. — Bastian Stein

guillaume blaquiere guillaume blaquiere · Accepted Answer · 2020-02-04T22:32:59

Some month ago, in private Alpha, I performed tests and I observed the same behavior. After discussion with Google team, I understood that instances are over provisioned "in case of": an instances crashes, an instances is preempted, the traffic suddenly increase,...

The trade-off of this is that you will have more cold start that your max instances values. Worse, you will be charged for this over provisioned cold start -> this is not an issue because Cloud Run has a huge free tier that covers this kind of glitches.

Going deeper in the logs (you can do it by creating a sink of Cloud Run logs into BigQuery and then by requesting them), even if there is more instances up than your max instances, only your max instances are active in the same time. I'm not sure to be clear. With your parameters, that means, if you have 5 instances up in the same time, only 3 serve the traffic at the same point of time

This part is not documented because it evolves constantly for find the best balance between over-provisioning and lack of ressources (and 429 errors).

@Steren @AhmetB can you confirm or correct me?

What can cause a Cloud Run instance to not be reused despite continuous load?

2 Answers