10
votes

I’m having this peculiar problem when attempting to restart Unicorn using a USR2 signal. On a clean reboot of the VPS, I have no problems with sending a USR2 signal to Unicorn and having it gracefully restart. However, after an hour or so if I try to do it again, I will be left with an old master hanging around preventing the new master from starting. I am then forced to kill the old master so the new master can start. If I reboot the VPS, it fixes it but then after an hour so the problem starts again. I'm on Rails 4, Ruby 2.0.0.

unicorn.log

I, [2014-01-07T15:37:37.118523 #19797]  INFO -- : executing ["/srv/rails/current/bin/unicorn", "-c", "/srv/rails/current/config/unicorn.rb", {12=>#<Kgio::UNIXServer:fd 12>}] (in /srv/rails/releases/20140107091945)
I, [2014-01-07T15:37:37.118983 #19797]  INFO -- : forked child re-executing...
I, [2014-01-07T15:37:38.998632 #19797]  INFO -- : inherited addr=/srv/rails/shared/sockets/unicorn.sock fd=12
I, [2014-01-07T15:37:38.999038 #19797]  INFO -- : Refreshing Gem list
I, [2014-01-07T15:37:41.927794 #19967]  INFO -- : Refreshing Gem list
/srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:219:in `pid=': Already running on PID:19967 (or pid=/srv/rails/shared/pids/unicorn.pid is stale) (ArgumentError)
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:151:in `start'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/bin/unicorn:126:in `<top (required)>'
        from /srv/rails/current/bin/unicorn:16:in `load'
        from /srv/rails/current/bin/unicorn:16:in `<main>'
/srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:219:in `pid=': Already running on PID:21250 (or pid=/srv/rails/shared/pids/unicorn.pid is stale) (ArgumentError)
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:151:in `start'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/bin/unicorn:126:in `<top (required)>'
        from /srv/rails/current/bin/unicorn:16:in `load'
        from /srv/rails/current/bin/unicorn:16:in `<main>'
E, [2014-01-07T15:40:46.720131 #20878] ERROR -- : reaped #<Process::Status: pid 21075 exit 1> exec()-ed
E, [2014-01-07T15:40:46.720870 #20878] ERROR -- : master loop error: Already running on PID:21250 (or pid=/srv/rails/shared/pids/unicorn.pid is stale) (ArgumentError)
E, [2014-01-07T15:40:46.723525 #20878] ERROR -- : /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:219:in `pid='
E, [2014-01-07T15:40:46.723671 #20878] ERROR -- : /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:413:in `reap_all_workers'
E, [2014-01-07T15:40:46.723747 #20878] ERROR -- : /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:292:in `join'
E, [2014-01-07T15:40:46.723815 #20878] ERROR -- : /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/bin/unicorn:126:in `<top (required)>'
E, [2014-01-07T15:40:46.723880 #20878] ERROR -- : /srv/rails/current/bin/unicorn:16:in `load'
E, [2014-01-07T15:40:46.723930 #20878] ERROR -- : /srv/rails/current/bin/unicorn:16:in `<main>'
E, [2014-01-07T15:41:13.704700 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:13.704901 #21250] ERROR -- : retrying in 0.5 seconds (4 tries left)
E, [2014-01-07T15:41:14.205452 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:14.205597 #21250] ERROR -- : retrying in 0.5 seconds (3 tries left)
78.40.124.16, 173.245.49.122 - - [07/Jan/2014 15:41:14] "GET / HTTP/1.0" 200 28697 0.8345
E, [2014-01-07T15:41:14.706179 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:14.706335 #21250] ERROR -- : retrying in 0.5 seconds (2 tries left)
E, [2014-01-07T15:41:15.206834 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:15.206987 #21250] ERROR -- : retrying in 0.5 seconds (1 tries left)
E, [2014-01-07T15:41:15.707431 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
E, [2014-01-07T15:41:15.707563 #21250] ERROR -- : retrying in 0.5 seconds (0 tries left)
78.40.124.16, 149.154.158.74 - - [07/Jan/2014 15:41:15] "GET / HTTP/1.0" 200 32866 0.4528
E, [2014-01-07T15:41:16.208055 #21250] ERROR -- : adding listener failed addr=/srv/rails/shared/sockets/unicorn.sock (in use)
/srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/socket_helper.rb:158:in `initialize': Address already in use - "/srv/rails/shared/sockets/unicorn.sock" (Errno::EADDRINUSE)
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/socket_helper.rb:158:in `new'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/socket_helper.rb:158:in `bind_listen'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:255:in `listen'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:801:in `block in bind_new_listeners!'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:801:in `each'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:801:in `bind_new_listeners!'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/lib/unicorn/http_server.rb:146:in `start'
        from /srv/rails/shared/bundle/ruby/2.0.0/gems/unicorn-4.7.0/bin/unicorn:126:in `<top (required)>'
        from /srv/rails/current/bin/unicorn:16:in `load'
        from /srv/rails/current/bin/unicorn:16:in `<main>'

unicorn.rb

deploy_path = "/srv/rails"
RAILS_ENV = ENV['RAILS_ENV'] || "production"

working_directory "#{deploy_path}/current"
pid "#{deploy_path}/shared/pids/unicorn.pid"
stderr_path "#{deploy_path}/shared/log/unicorn.log"

# Listen on a UNIX data socket
listen "#{deploy_path}/shared/sockets/unicorn.sock"
worker_processes 4

# Preload application before forking worker processes
preload_app true

# Restart any workers that haven't responded in 30 seconds
timeout 30

before_fork do |server, worker|
  ##
  # When sent a USR2, Unicorn will suffix its pidfile with .oldbin and
  # immediately start loading up a new version of itself (loaded with a new
  # version of our app). When this new Unicorn is completely loaded
  # it will begin spawning workers. The first worker spawned will check to
  # see if an .oldbin pidfile exists. If so, this means we've just booted up
  # a new Unicorn and need to tell the old one that it can now die. To do so
  # we send it a QUIT.
  #
  # Using this method we get 0 downtime deploys.

  old_pid = "#{server.config[:pid]}.oldbin"

  if File.exists?(old_pid) && server.pid != old_pid
    begin
      Process.kill("QUIT", File.read(old_pid).to_i)
    rescue Errno::ENOENT, Errno::ESRCH => e
      log = File.open(Rails.root.join('log/unicorn.log'), "a")
      log.puts "Error encountered when killing process:\n"
      log.puts "#{e.message}"
      log.close
    end
  end

  # the following is recomended for Rails + "preload_app true"
  # as there's no need for the master process to hold a connection
  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.connection.disconnect!
  end
end

after_fork do |server, worker|
  ##
  # Unicorn master loads the app then forks off workers - because of the way
  # Unix forking works, we need to make sure we aren't using any of the parent's
  # sockets, e.g. db connection

  ActiveRecord::Base.establish_connection
  # Redis and Memcached would go here but their connections are established
  # on demand, so the master never opens a socket

  ##
  # Unicorn master is started as root, which is fine, but let's
  # drop the workers to deployer
  begin
    uid, gid = Process.euid, Process.egid
    user, group = 'deployer', 'deployer'
    target_uid = Etc.getpwnam(user).uid
    target_gid = Etc.getgrnam(group).gid
    worker.tmp.chown(target_uid, target_gid)
    if uid != target_uid || gid != target_gid
      Process.initgroups(user, target_gid)
      Process::GID.change_privilege(target_gid)
      Process::UID.change_privilege(target_uid)
    end
  rescue => e
    if RAILS_ENV == 'development'
      STDERR.puts "couldn't change user, oh well"
    else
      raise e
    end
  end
end

deploy.rb

require 'bundler/capistrano'    # runs a bundle install --deployment

# https://github.com/sstephenson/rbenv/issues/101
set :keep_releases, 10
set :shared_children, shared_children + %w(public/images public/uploads)

# Multistage extension
set :stages, ["production", "staging"]
set :default_stage, "staging"
require 'capistrano/ext/multistage'
require 'underglow/capistrano'

# Whenever crontab updates
set :whenever_environment, defer { stage }
set :whenever_command, "bin/whenever"
require 'whenever/capistrano'

set :application, "rails"
set :user, "deployer"

default_run_options[:pty] = true
default_run_options[:shell] = '/bin/zsh'
set :use_sudo, false

# repository
set :repository,      "XXXXXXXXXXXXXXXXX"
set :branch,          fetch(:branch, "master")  # can specify a branch from `cap -S branch="<branch_name>"`
set :scm,             :git
set :scm_verbose,     true

set :ssh_options, forward_agent: true

set :deploy_to,       "/srv/rails"
set :deploy_via,      :remote_cache

# We're using a rbenv user install, setup the PATH we need to access the rbenv shims
set :default_environment, {
  'PATH' => "$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH"
}

Has anyone seen this?

2
How are you sending the signals to unicorn? Are you using an init.d script (like gist.github.com/jaygooby/504875) or capistrano-unicorn (github.com/sosedoff/capistrano-unicorn), or something else? Can the unicorn process write/read to the pid directory?phillbaker
could you show your deploy.rb file?itsnikolay
I am using unix command to send a USR2 signal, i.e. kill -USR2 PIDaxsuul
@itsnikolay added deploy.rb :)axsuul
I am seeing almost this exact problem. It started shortly after a "gem update" of various gems, including Unicorn. I have yet to track it down myself, but I do have a work-around -- I ssh to the machine, prepare a "kill" for the unicorn master process, wait until I see "unable to restart" from capistrano, then kill it. It manages to continue from there. Not elegant...Michael Graff

2 Answers

4
votes

You should check the unicorn stdout/stderr logs for more evidence about why the old unicorn may be hanging or the new one failing to kill it off properly.

One gotcha is that if the older capistrano release directory has been removed during deployment of the new release, you may have bundler errors during the hot-swap handoff. Folks advise adding the following to bind to the permanent path to the Gemfile vs. the release-specific path:

before_exec do |server|
  ENV['BUNDLE_GEMFILE'] = "#{deploy_path}/current/Gemfile"
end

If that is the problem you're having, you should be seeing bundler errors or a failure in the unicorn logs.

3
votes

This may not help you, but here is what I did to "fix" the problem.

I started getting this problem from the release of Unicorn 4.7.0. In 4.7.0, the behavior of how pid files were written changed, and broke my restart script. The old pre-4.7.0 behavior was: move pid file to oldpid, write new pid, start up workers, shut down master. The last step was in my unicorn.rb file of course. The new behavior was to remove the old pid quickly, and only write the new one after some heavy lifting occurred. This broke my script as it could not trust that things restarted properly. This caused my sh script to attempt to restart it, leading to confusion with the now-freshly started unicorn process and the eh-script started "full start" one. Both lost in various ways, so both exited, leaving a old master still serving requests.

I also had a defect in my unicorn.rb file which did not properly set up bundler, as someone already mentioned.

Upgrading to Unicorn 4.8.1, released recently, fixed this problem as pid files are written as they were in pre-4.7.0 days.