6
votes

Assume I have the below setup on Heroku + Rails, with one web dyno and two worker dynos.

Below is what I believe to be true, and I'm hoping that someone can confirm these statements or point out an assumption that is incorrect.

I'm confident in most of this, but I'm a bit confused by the usage of client and server, "connection pool" referring to both DB and Redis connections, and "worker" referring to both puma and heroku dyno workers.

I wanted to be crystal clear, and I hope this can also serve as a consolidated guide for any other beginners having trouble with this

Thanks!

enter image description here

How everything interacts

  • A web dyno (where the Rails application runs)

    • only interacts with the DB when it needs to query it to serve a page request
    • only interacts with Redis when it is pushing jobs onto the Sidekiq queue (stored in Redis). It is the Sidekiq client
  • A Worker dyno

    • only interacts with the DB if the Sidekiq job it's running needs to query the DB
    • only interacts with Redis to pull jobs from the Sidekiq queue (stored in Redis). It is the Sidekiq server

ActiveRecord Pool Size

  • An ActiveRecord pool size of 25 means that each dyno has 25 connections to work with. (This is what I'm most unsure of. Is it each dyno or each Puma/Sidekiq worker?)

  • For the web dynos, it can only run 10 things (threads) at once (2 puma x 5 threads), so it will only consume a maximum of 10 threads. 25 is above and beyond what it needs.

  • For worker dynos, the Sidekiq concurrency of 15 means 15 Sidekiq processes can run at a time. Again, 25 connections is beyond what it needs, but it's a nice buffer to have in case there are stale or dead connections that won't clear.

  • In total, my Postgres DB can expect 10 connections from the web dyno and 15 connects from each worker dyno for a total of 40 connections maximum.

Redis Pool Size

  • The web dyno (Sidekiq client) will use the connection pool size specified in the Sidekiq.configure_client block. Generally ~3 is sufficient because the client isn't constantly adding jobs to the queue. (Is it 3 per dyno, or 3 per Puma worker?)

  • Each worker dyno (Sidekiq server) will use the connection pool size specified in the Sidekiq.configure_server block. By default it's sidekiq concurrency + 2, so here 17 redis connections will be taken up by each dyno

1
Excise the word "worker" from your post. Use "thread", "process" and "dyno". Concurrency of 15 means 1 Sidekiq process has 15 threads.Mike Perham
Connection pool sizes are per-process. A dyno can run multiple processes.Mike Perham

1 Answers

2
votes

I don't know Heroku + Rails but believe I can answer some of the more generic questions.

From the client's perspective, the setup/teardown of any connection is very expensive. The concept of connection pooling is to have a set of connections which are kept alive and can be used for some period of time. The JDK HttpUrlConnection does the same (assuming HTTP 1.1) so that - assuming you're going to the same server - the HTTP connection stays open, waiting for the next expected request. Same thing applies here - instead of closing a JDBC connection each time, the connection is maintained - assuming same server and authentication credentials - so the next request skips the unnecessary work and can immediately move forward in sending work to the database server.

There are many ways to maintain a client-side pool of connections, it may be part of the JDBC driver itself, you might need to implement pooling using something like Apache Commons Pooling, but whatever you do it's going to increase your behavior and reduce errors that might be caused by network hiccups that could prevent your client from connecting to the server.

Server-side, most database providers are configured with a pool of n possible connections that the database server may accept. Usually each additional connection has a footprint - usually quite small - so based on the memory available you can figure out the maximum number of available connections.

In most cases, you're going to want to have larger-than-expected connections available. For example, in postgres, the configured connection pool size is for all connections to any database on that server. If you have development, test, and production all pointed at the same database server (obviously different databases), then connections used by test might prevent a production request from being fulfilled. Best not to be stingy.