1
votes

In a traditional server-based web application, the application can pool a number of database connection at launch time and use these connections to process client requests throughout the lifetime of the application. Performance benefits because the overhead to create/destroy a database connection is paid only once at startup/termination.

In a serverless model (such as google cloud function/AWS-lambda, etc) the application is started, the request is processed, and the application is shut down. Therefore, the application has no ability to pool connections with the intention to use over multiple requests.

What is the appropriate design pattern, if any, to improve database connection performance in the serverless model? Finally, if no solution exists, do database requests from a serverless architecture kill performance? What kind of slowdowns are expected?

2
What you're saying isn't really true for Cloud Functions. When a server instance is allocated, it will handle many requests in serial before being shut down. You can cache a database connection at the global scope in the first request, and reuse it for subsequent requests handled by that instance. You just don't have any hard guarantees how long that instance will stay up (it will be deallocated when idle), but under load, it will definitely be reused for performance. - Doug Stevenson
Same for AWS Lambda btw, it also keeps the instance running beyond a single request. I actually think your comment might be a good basis for an answer. - Frank van Puffelen
And what kind of performance hit are you taking here? - Train

2 Answers

0
votes

There is a new service for Amazon RDS that is built specifically to solve this problem, Amazon RDS Proxy.

0
votes

When you look at Serverless solution, like Cloud Run, Cloud Function or even Lambda (but I won't talk about it because I really don't know AWS) instances are created according with the traffic.

I agree with you, you don't manage when the instances are created and killed, but an instance is not created on each request. Indeed, the instances are reused. By the way, when the instance is created you "pay" only 1 time the cold start, including the database pool creation. The subsequent request on the same instance (called warm start) could reuse the pool until the instance is killed by the underlying platform. By the way, store this pool in a global variable for reusing it request after request on the same instance.

For optimizing your pool size, carefully look at the platform behavior. For example, Cloud Function (and I think Lambda, without commitment on it) can handle only 1 request per instance at the same time. By the way, when you create your pool, create a pool with only 1 connection to your database, more is useless because never more that pooled connection will be used! In Java, for example, I know that the default pool create 50 connections per default, take care of this!.

WHY?? because your database has a limit of number of connection possible. That's why, you can also limit the maximum number of instances in parallel on Cloud Function for avoiding any database connections issue (--max-instances param). If your database accept 400 connections, you must not have more than 400 Cloud Functions in the same time!

Now, look at Cloud Run (and AppEngine, same behavior). With Cloud Run, you can handle up to 80 concurrent request, in the same time, on the same instance. This time, set your pool size and/or the concurrency param on Cloud Run equals. For example, set the concurrency to 40 and your pool size to 40 connections.

Same recommendation as for Cloud Functions, with a database which accept 400 connections, in this last example, set the --max-instances param on Cloud Run to 10 this time.

This limitation is important for saving your database connection through your pool, but you can also use it if you have the same limitation for accessing to an API or something else.