I run 4 Unicorn processes for my Rails app and they eat up all the available MySQL connections causing it to collapse with 'too many connections' error. Today I had to reboot my DB instance 4 times. =(
Processes
$ ps ax | grep [u]ni 21618 ? Sl 0:15 unicorn master -D -c /home/deployer/apps/XXX/shared/config/unicorn.rb -E production 21632 ? Sl 0:20 unicorn worker[0] -D -c /home/deployer/apps/XXX/shared/config/unicorn.rb -E production 21636 ? Sl 0:14 unicorn worker[1] -D -c /home/deployer/apps/XXX/shared/config/unicorn.rb -E production 21640 ? Sl 0:20 unicorn worker[2] -D -c /home/deployer/apps/XXX/shared/config/unicorn.rb -E production 21645 ? Sl 0:12 unicorn worker[3] -D -c /home/deployer/apps/XXX/shared/config/unicorn.rb -E production
My database.yml is setting up 22 connections for the ActiveRecord pool...
... production: adapter: mysql2 encoding: utf8 database: xxx username: xxx password: xxx host: xxx port: 3306 pool: 22 ...
And the Unicorn config file looks like this:
working_directory "/home/deployer/apps/XXX/current" pid "/home/deployer/apps/XXX/shared/pids/unicorn.pid" stderr_path "/home/deployer/apps/XXX/shared/log/unicorn.log" stdout_path "/home/deployer/apps/XXX/shared/log/unicorn.log" listen "/tmp/unicorn.XXX.sock" worker_processes 4 timeout 100 preload_app true before_fork do |server, worker| # Disconnect since the database connection will not carry over if defined? ActiveRecord::Base ActiveRecord::Base.connection.disconnect! end # Quit the old unicorn process old_pid = "#{server.config[:pid]}.oldbin" if File.exists?(old_pid) && server.pid != old_pid begin Process.kill("QUIT", File.read(old_pid).to_i) rescue Errno::ENOENT, Errno::ESRCH # someone else did our job for us end end end after_fork do |server, worker| # Start up the database connection again in the worker if defined?(ActiveRecord::Base) ActiveRecord::Base.establish_connection end child_pid = server.config[:pid].sub(".pid", ".#{worker.nr}.pid") system("echo #{Process.pid} > #{child_pid}") end
And if we look into the DB console, we'll see something like this. They've eaten most of the connections. (I had nothing but Unicorn running at the moment) To my mind there should have been 1 connection * 4 unicorns = 4 connections.
mysql> show full processlist; +-----+----------+--------------------------------------------------+------------------------+---------+------+-------+-----------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+----------+--------------------------------------------------+------------------------+---------+------+-------+-----------------------+ | 2 | rdsadmin | localhost:31383 | NULL | Sleep | 9 | | NULL | | 52 | level | 212.100.140.42:50683 | leveltravel_production | Query | 0 | NULL | show full processlist | | 74 | level | ip-10-55-10-151.eu-west-1.compute.internal:38197 | leveltravel_production | Sleep | 5 | | NULL | | 75 | level | ip-10-55-10-151.eu-west-1.compute.internal:38199 | leveltravel_production | Sleep | 8 | | NULL | | 76 | level | ip-10-55-10-151.eu-west-1.compute.internal:38201 | leveltravel_production | Sleep | 8 | | NULL | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 157 | level | ip-10-55-10-151.eu-west-1.compute.internal:38321 | leveltravel_production | Sleep | 154 | | NULL | | 158 | level | ip-10-55-10-151.eu-west-1.compute.internal:38322 | leveltravel_production | Sleep | 17 | | NULL | | 159 | level | ip-10-55-10-151.eu-west-1.compute.internal:38325 | leveltravel_production | Sleep | 54 | | NULL | | 160 | level | ip-10-55-10-151.eu-west-1.compute.internal:38326 | leveltravel_production | Sleep | 54 | | NULL | | 161 | level | ip-10-55-10-151.eu-west-1.compute.internal:38327 | leveltravel_production | Sleep | 54 | | NULL | | 162 | level | ip-10-55-10-151.eu-west-1.compute.internal:38329 | leveltravel_production | Sleep | 42 | | NULL | +-----+----------+--------------------------------------------------+------------------------+---------+------+-------+-----------------------+ 90 rows in set (0.15 sec)
You may also have a look at Issue #503 in sidekiq repository for the background of this problem https://github.com/mperham/sidekiq/issues/503