6
votes

I have been happily running celery+rabbitmq+django for a month or so in production. Yesterday, I decided to upgrade from celery 2.1.4 to 2.2.4 and now rabbitmq is spinning out of control. After running for a while, my nodes are no longer recognized by evcam, and beam.smp's memory consumptions starts increasing...slowly (100+% CPU usage).

I can run rabbitmqctl list_connections and see that there is nothing unusual (just my one test node). I can see in rabbitmqctl list_queues -p <VHOST> that there are no messages except the heartbeat from my test node. If I let the process keep running over a couple of hours it maxes out the machine.

I've tried purging the various queues using camqadm to no avail and stop_app just hangs. The only way that I have found to 'fix' it is to kill -9 beam.smp (and all related processes) and force_reset on my rabbitmq server.

I have no idea how to go about debugging this. There doesn't appear to be anything fishy going on as far as new messages etc. Has anybody run up against this before? Any ideas? What other information should I be looking at?

2
did you upgrade rabbitmq as well? I had similar symptoms with 2.2.x, so we downgraded to RabbitMQ 2.1.1 and had no issues. - asksol
I downgraded to 2.1.1 and the problem went away. Any idea why? - Bacon
What version were you running when you had the symptoms? - asksol
I was running celery 2.2.4 with rabbitmq 2.2.0. I had been using celery 2.1.4 with the same version of rabbit without any issues. - Bacon

2 Answers

4
votes

The celery developer told me 3 months ago that the versions of RabbitMQ after the 2.1.1 was affected by memory leak, with cpu peaks. I'm still using the version 2.1.1 and I don't have this problem

http://www.rabbitmq.com/releases/rabbitmq-server/v2.1.1/

Is also true that the celery 2.2.4 version introduced some memory problem, but if you update to celery 2.2.5 most of them are solved.

http://docs.celeryproject.org/en/v2.2.5/changelog.html#fixes

I hope this could help

1
votes

May not be helpful, but we recently tracked down a memory leak in the Java Virtual Machine related to the extensions used to monitor garbage collection. It may be that your heartbeat monitor is triggering these methods, which result in a native memory leak.

The issue is described here: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129