I run 4 Unicorn processes for my Rails app and they eat up all the available MySQL connections causing it to collapse with 'too many connections' error. Today I had to reboot my DB instance 4 times. =(
Processes
$ ps ax | grep [u]ni 21618 ? Sl 0:15 unicorn master -D -c /home/deployer/apps/XXX/shared/config/unicorn.rb -E production 21632 ? Sl 0:20 unicorn worker[0] -D -c /home/deployer/apps/XXX/shared/config/unicorn.rb -E production 21636 ? Sl 0:14 unicorn worker[1] -D -c /home/deployer/apps/XXX/shared/config/unicorn.rb -E production 21640 ? Sl 0:20 unicorn worker[2] -D -c /home/deployer/apps/XXX/shared/config/unicorn.rb -E production 21645 ? Sl 0:12 unicorn worker[3] -D -c /home/deployer/apps/XXX/shared/config/unicorn.rb -E production
My database.yml is setting up 22 connections for the ActiveRecord pool...
... production: adapter: mysql2 encoding: utf8 database: xxx username: xxx password: xxx host: xxx port: 3306 pool: 22 ...
And the Unicorn config file looks like this:
working_directory "/home/deployer/apps/XXX/current"
pid "/home/deployer/apps/XXX/shared/pids/unicorn.pid"
stderr_path "/home/deployer/apps/XXX/shared/log/unicorn.log"
stdout_path "/home/deployer/apps/XXX/shared/log/unicorn.log"
listen "/tmp/unicorn.XXX.sock"
worker_processes 4
timeout 100
preload_app true
before_fork do |server, worker|
# Disconnect since the database connection will not carry over
if defined? ActiveRecord::Base
ActiveRecord::Base.connection.disconnect!
end
# Quit the old unicorn process
old_pid = "#{server.config[:pid]}.oldbin"
if File.exists?(old_pid) && server.pid != old_pid
begin
Process.kill("QUIT", File.read(old_pid).to_i)
rescue Errno::ENOENT, Errno::ESRCH
# someone else did our job for us
end
end
end
after_fork do |server, worker|
# Start up the database connection again in the worker
if defined?(ActiveRecord::Base)
ActiveRecord::Base.establish_connection
end
child_pid = server.config[:pid].sub(".pid", ".#{worker.nr}.pid")
system("echo #{Process.pid} > #{child_pid}")
end
And if we look into the DB console, we'll see something like this. They've eaten most of the connections. (I had nothing but Unicorn running at the moment) To my mind there should have been 1 connection * 4 unicorns = 4 connections.
mysql> show full processlist; +-----+----------+--------------------------------------------------+------------------------+---------+------+-------+-----------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+----------+--------------------------------------------------+------------------------+---------+------+-------+-----------------------+ | 2 | rdsadmin | localhost:31383 | NULL | Sleep | 9 | | NULL | | 52 | level | 212.100.140.42:50683 | leveltravel_production | Query | 0 | NULL | show full processlist | | 74 | level | ip-10-55-10-151.eu-west-1.compute.internal:38197 | leveltravel_production | Sleep | 5 | | NULL | | 75 | level | ip-10-55-10-151.eu-west-1.compute.internal:38199 | leveltravel_production | Sleep | 8 | | NULL | | 76 | level | ip-10-55-10-151.eu-west-1.compute.internal:38201 | leveltravel_production | Sleep | 8 | | NULL | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CUT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 157 | level | ip-10-55-10-151.eu-west-1.compute.internal:38321 | leveltravel_production | Sleep | 154 | | NULL | | 158 | level | ip-10-55-10-151.eu-west-1.compute.internal:38322 | leveltravel_production | Sleep | 17 | | NULL | | 159 | level | ip-10-55-10-151.eu-west-1.compute.internal:38325 | leveltravel_production | Sleep | 54 | | NULL | | 160 | level | ip-10-55-10-151.eu-west-1.compute.internal:38326 | leveltravel_production | Sleep | 54 | | NULL | | 161 | level | ip-10-55-10-151.eu-west-1.compute.internal:38327 | leveltravel_production | Sleep | 54 | | NULL | | 162 | level | ip-10-55-10-151.eu-west-1.compute.internal:38329 | leveltravel_production | Sleep | 42 | | NULL | +-----+----------+--------------------------------------------------+------------------------+---------+------+-------+-----------------------+ 90 rows in set (0.15 sec)
You may also have a look at Issue #503 in sidekiq repository for the background of this problem https://github.com/mperham/sidekiq/issues/503