Why are my celery worker processes processing everything in serial?

Question

I'm running multiple celery worker processes on a AWS c3.xlarge (4 core machine). There is a "batch" worker process with its --concurrency parameter set to 2, and a "priority" process with its --concurrency parameter set to 1. Both worker processes draw from the same priority queue. I am using Mongo as my broker. When I submit multiple jobs to the priority queue they are processed serially, one after the other, even though multiple workers are available. All items are processed by the "priority" process, but if I stop the "priority" process, the "batch" process will process everything (still serially). What could I have configured incorrectly that prevents celery from processing jobs asynchronously?

EDIT: It turned out that the synchronous bottleneck is in the server submitting the jobs rather than in celery.

Do you mean that the tasks are not fairly dispatched to the workers ? — Pierre
Yes. However, it is not just that some workers get more tasks than others. All the tasks are given to a single worker process, even if it is busy and other workers could handle them. I noticed my memory is very low on this machine so I'm currently looking into whether that is the problem. — Nathan Breit

Pierre Pierre · Accepted Answer · 2014-09-30T13:28:06

By default the worker will prefetch 4 * concurrency number tasks to execute, which means that your first running worker is prefetching 4 tasks, so if you are queuing 4 or less tasks they will be all processed by this worker alone, and there won't be any other messages in the queue to be consumed by the second worker.

You should set the CELERYD_PREFETCH_MULTIPLIER to a number that works best for you, I had this problem before and set this option to 1, now all my tasks are fairly consumed by the workers.

Why are my celery worker processes processing everything in serial?

1 Answers