python - Celery difference between concurrency, workers and autoscaling

Question

In my /etc/defaults/celeryd config file, I've set:

CELERYD_NODES="agent1 agent2 agent3 agent4 agent5 agent6 agent7 agent8"
CELERYD_OPTS="--autoscale=10,3 --concurrency=5"

I understand that the daemon spawns 8 celery workers, but I'm fully not sure what autoscale and concurrency do together. I thought that concurrency was a way to specify the max number of threads that a worker can use and autoscale was a way for the worker to scale up and down child workers, if necessary.

The tasks have a largish payload (some 20-50kB) and there are like 2-3 million such tasks, but each task runs in less than a second. I'm seeing memory usage spike up because the broker distributes the tasks to every worker, thus replicating the payload multiple times.

I think the issue is in the config and that the combination of workers + concurrency + autoscaling is excessive and I would like to get a better understanding of what these three options do.

the documentation for autoscale and concurrency is pretty clear. What bits don't you understand. In particular it doesn't really make sense to specify both at the same time. And what exactly is your problem? The memory spike? Is this actually a problem - i.e. are you hitting swap, or seeing OOM invoked? — scytale
@scytale I'm seeing OOM invoked. Lots of processes are simply terminated with Killed when it spikes up. I think I'm clear on the autoscale vs. concurrency now. I thought that --autoscale would add more workers, but it's simply a dynamic setting for specifying concurrency instead of a fixed setting with --concurrency. I guess my only remaining confusion is surrounding "add more workers with less concurrency or add fewer workers with more concurrency". I don't know how to evaluate the tradeoff for that. — Joseph
let's distinguish between workers and worker processes. you spawn a celery worker, this then spawns a number of processes (depending on things like --concurrency and --autoscale). There is no point in running more than one worker unless you want to do routing, listen to different queues etc. I would say run one worker with the default number of processes (i.e. omit --concurrency and --autoscale and it will default to as many processes as there are cores). Then test your application with a view to estabilshing the concurrency level that suits your needs. — scytale
The memory spikes may indicate that you need to reevaluate your data structures etc. Also if your tasks run in less than a second you are probably wasting a lot of time in messaging overhead - can you not refactor your code or change your chunk size so they run for longer? — scytale
@scytale I've solved almost all of my issues. The two biggest wins were: 1) Moving the payload into a db and only passing the payload id to the task. Instantly stabilized rabbitmq and celery (they would occasionally buckle under the combined weight of the payload) and required very little design change and 2) Using a single worker with the appropriate number of concurrent processes to reduce duplication. Thanks for your help and patience! :) If you'd like to summarize your points above, I'd be happy to accept your answer. — Joseph

scytale scytale · Accepted Answer · 2015-08-17T08:57:11

Let's distinguish between workers and worker processes. You spawn a celery worker, this then spawns a number of processes (depending on things like --concurrency and --autoscale, the default is to spawn as many processes as cores on the machine). There is no point in running more than one worker on a particular machine unless you want to do routing.

I would suggest running only 1 worker per machine with the default number of processes. This will reduce memory usage by eliminating the duplication of data between workers.

If you still have memory issues then save the data to a store and pass only an id to the workers.

python - Celery difference between concurrency, workers and autoscaling

1 Answers