1
votes

I'm just curious how to solve the connection-pooling problem in the scalable java application.

Imagine I have java web application with HikariCP set up (max pool size is 20) and PosgtreSQL with max allowed connections 100.

And now I want to implement scalability approach for my web app (no matter how) end even with autoscaling. So I don't know how many web app replicas will be eventually, it may dynamically change (caused by some reasons e.g. cluster workload).

But there is the problem. When I create more then 5 web app replicas cause my total connection count exceeds max allowed connection.

Are there any best practices to solve this problem (except evident increasing max allowed connections/decreasing pool size)?

Thanks

2
You'll have to measure where's the bottleneck (if there is one): maybe 20 connections per machine is too much ? maybe you can open more than 100 connections to PosgtreSQL ? maybe you should have a buffer (queue) and you'll submit update queries to the queue in order to throttle the requests and maybe you just need a different technology which is more scalable.Nir Alfasi
@alfasin good point, for now I'm just researching the possible options. And I'm wondering are the any best practices to organize connection pool in the scalable application.Maksim Prabarshchuk
Are you replicating the database as well, because that would save you a lot of worries? I don't mean a separate db for each webapp, but it's going to be the bottleneck when scaling.Kayaman

2 Answers

0
votes

You need an orchestrator over the web application. It would be responsible for the scaling in-out and it will manage the connections in order not to exceed the limitation of 100. It will open-close the connections according to the traffic.

Nevertheless, my recommendation is to take into consideration the migration into a no-SQL database which is more suitable solution for scalability and performance.

0
votes

I'll start by saying that whatever you do, as long as you're restricted by 100 connections to your DB - it will not scale!

That said, you can optimize and "squeeze" performance out of it by applying a couple of known tricks. It's important to understand the trade-offs (availability vs. consistency, latency vs. throughput and etc):

  1. Caching: if you can anticipate certain select queries you can calculate them offline (maybe even from a replica?) and cache the results. The tradeoff: the user might get results which are not up-to-date

  2. Buffering/throttling: all updates/inserts go to a queue and there are only a few workers which are allowed to pull from the queue and update the DB. Tradeoff: you get more availability but becomes "eventually consistent" (since updates won't be visible right away).

  3. It might come to that you'll have to run the selects in async manner as well, which means that the user submits a query, and when it's ready it'll be "pushed" back to the client (or the client can keep "polling" every few seconds). It can be implemented with a callback as well.

By separating the updates (writes) from reads you'll be able to get more performance by creating replicas that are "read only" and which can be used by the webservers for read-queries.