3
votes

Context:

My Spring-Boot app runs as expected on Cloud Run when I deploy it with max-instances set to 1: It receives a constant stream of pubsub messages via push, and makes anywhere from 0 to 5 writes to an associated CloudSQL instance, depending on the message payload. Typically it handles between 20 and 40 messages per second. Latency/response-time varies between 50ms and 60sec, probably due to some resource contention.

In order to increase throughput/ decrease resource contention, I'm looking to experiment with the connection pool size per app-instance, as well as the concurrency and max-instances parameters for my cloud run app.

I understand that due to Spring-Boot, my app has a relatively high cold-start time of about 30-40 seconds. This is acceptable for how this service is used.

Problem:

I'm experiencing problems when deploying a spring-boot app to cloud run with max-instances set to a value greater than 1:

  • Instances start, handle a single request successfully, and then produce no more logs.
  • This happens a few times per minute, leading me to believe that instances get started (cold-start), handle a single request, die, and then get started again. They are not being reused as described in the docs, and as is happening when I set max-instances to 1. Official docs on concurrency
  • Instead, I expect 3 container instances to be started, which then each requests according to max-concurrency setting.

Billable container time at max-instances=3: Billable container time graph

  • As shown in the graph, the number of instances is fluctuating wildly, once the new revision with max-instances=3 is deployed.
  • The graphs for CPU- and memory-usage also look like this.
  • There are no error logs. As before at max-instaces=1, there are warnings indicating that there are not enough instances available to handle requests (HTTP 429).
  • Connection Limit of CloudSQL instance has not been exceeded
  • Requests are handled at less than 10/s

Finally, this is the command used to deploy:

 gcloud beta run deploy my-service --project=[...] --image=[...] --add-cloudsql-instances=[...] --region=[...] --platform=managed --memory=1Gi --max-instances=3 --concurrency=3 --no-allow-unauthenticated

What could cause this behavior?

2
You are making very big incorrect assumptions about Cloud Run. Cloud Run is an HTTP Request/Response system with code as a container. Everything is a single request with a single response. Design your system that way. Also, when you talk about latency, show how you measure it. Just opening a TCP/IP connection over a long distance can take 100 ms. Do not use threading as that will not improve anything. Make an HTTP request to Cloud Run, receive an HTTP response. Nothing else. Do not rely on state, do not assume the same instance will run again on the next request. Do not attempt to multitask.John Hanley
If those requirements do not meet your goals, then use a different compute service.John Hanley
Hi John, according to the cloud run docs, container instances are reused to serve incoming requests, and can handle multiple requests concurrently. The latency numbers I provided are the ones measured by the cloud run dashboard itself. My application is stateless. Connectiosn are made from google pubsub to the cloud run instance, which are in the same network, so no distances here. I am not using threading, I had a typo in the question where I was talking about a "threadpool" when I really meant connectionpool.Bastian Stein
A connection pool will not help you either. No background threads, connections, etc. The container lifetime exists only between request and response.John Hanley
Connection pooling is a recommended practice from the cloud run docs. While using multiple threads is inefficient, when there's only a single CPU core available, how exactly does this prevent the container instance from being reused for multiple requests? It works for max-instances 1 after all - identical code and configuration.Bastian Stein

2 Answers

0
votes

Some month ago, in private Alpha, I performed tests and I observed the same behavior. After discussion with Google team, I understood that instances are over provisioned "in case of": an instances crashes, an instances is preempted, the traffic suddenly increase,...

The trade-off of this is that you will have more cold start that your max instances values. Worse, you will be charged for this over provisioned cold start -> this is not an issue because Cloud Run has a huge free tier that covers this kind of glitches.

Going deeper in the logs (you can do it by creating a sink of Cloud Run logs into BigQuery and then by requesting them), even if there is more instances up than your max instances, only your max instances are active in the same time. I'm not sure to be clear. With your parameters, that means, if you have 5 instances up in the same time, only 3 serve the traffic at the same point of time

This part is not documented because it evolves constantly for find the best balance between over-provisioning and lack of ressources (and 429 errors).

@Steren @AhmetB can you confirm or correct me?

0
votes

When Cloud Run receives and processes requests rapidly, it predicts how many instances it needs, and will try to scale to the amount. If a sudden burst of requests occur, Cloud Run will instantiate a larger number of instances as a response. This is done in order to adapt to a possible higher number of network requests beyond what it is currently serving, with attempts to take into consideration the length of time it will take for the existing instance to complete loading the request. Per the documentation, it is possible that the amount of container instances can go above the max instance value when it spikes.

You mentioned with max-instances set to 1 it was running fine, but later you mentioned it was in fact producing 429s with it set to 1 as well. Seeing behavior of 429s as well as the instances spiking could indicate that the amount of traffic is not being handled fluidly.

It is also worth noting, because of the cold start time you mention, when instances are serving the first request(s), by design, the number of concurrent requests is actually hard set to 1. Once things are fully ready,only then the concurrency setting you have chosen is applied.

Was there some specific reason you chose 3 and 3 for Max Instance settings and concurrency? Also how was the concurrency set when you had max instance set to 1? Perhaps you could try tinkering up further the concurrency (max 80) and /or Max instances (high limit up to 1000) and see if that removes the 429s.