0
votes

Every now and then my celery worker will "lose" connection to RabbitMQ. I have looked through the logs and do not see anything in the celery logs, but I do see something in the RabbitMQ logs.

=WARNING REPORT==== 2-Jan-2013::09:13:04 === exception on TCP connection <0.14032.9> from 1.1.1.1:43760 connection_closed_abruptly

My set up is pretty simple. I have one server running the celery workers and another with the RabbitMQ queue. The worker connects remotely to the queue.

I have noticed that if I reboot the server with the RabbitMQ server that I have to manually restart the celery workers as well.

1

1 Answers

0
votes

This appeared to be an issue with the init script I was using. When sending a SIG TERM signal to the celery daemon process it does not kill the workers. It keeps the workers in a state of waiting for the mediator to feed tasks to the pool, but SIG TERM kills off the mediator.

Look at the debug statements I found below:

    [2013-01-02 16:23:58,624: DEBUG/MainProcess] Terminating celery.worker.consumer.Consumer...
    [2013-01-02 16:23:58,624: DEBUG/MainProcess] consumer: Stopping consumers...
    [2013-01-02 16:23:58,625: DEBUG/MainProcess] Terminating celery.worker.mediator.Mediator...
    [2013-01-02 16:23:59,034: DEBUG/MainProcess] Terminating celery.concurrency.processes.TaskPool...
    [2013-01-02 16:23:59,050: DEBUG/MainProcess] Terminating celery.worker.hub.Hub...
    [2013-01-02 16:23:59,050: DEBUG/MainProcess] consumer: Closing consumer channel...
    [2013-01-02 16:23:59,051: DEBUG/MainProcess] consumer: Closing broadcast channel...

The work around is to send the SIG TERM signal to all of the worker processes as well.

    if [ $(ps aux | grep -c 'celery') -eq 1 ] ; then
            ps auxww | grep celeryd | grep -v "grep" | awk '{print $2}' | sudo xargs kill -HUP
    fi

This coincidentally happened every time our jenkins build script would run (every post commit).