Correct Way for Celery Process Architecture and demonizing

Question

I have a Python/Django Project running on uwsgi/nginx. For asynchronous task we are using rabbitmq/celeryd and supervisord to manage all the daemons

Versions:

python: 2.7
django: 1.9.7
celery: 3.1.23
django-celery: 3.1.17

Celery has 10 queue of type Direct (say queue1, queue2, ...) Each queue is handled by a separate celeryd process which is manage via supervisord. each supervisord process looks as following

[program:app_queue_worker]
command=/var/www/myproj/venv/bin/celery worker -A myproj -c 2 --queue=queue1 --loglevel=INFO
directory=/var/www/myproj/
user=ubuntu
numprocs=1
autostart=true
autorestart=true
startsecs=10
exitcodes=1
stopwaitsecs = 600
killasgroup=true
priority=1000

Hence Supervisord is running 10 Mainprocess and 20 Worker process

Other Thing I have noticed is uwsgi also spawns some celery workers(Dont understand how and why, YET ) with concurrency=2. So if I have 4 uwsgi process running i will have an addition 10 celery workers running

All these workers are each taking 200-300M memory? Something is wrong here I feel it but I am not able to put my finger on it. Celery shouldn't be running such memory heavy process?

Note: Debug=False, there is no memory leakage due to debug

Can someone please comment on the architecture if it is correct or wrong?

Would it be better to run 2-3 celery MainProcesses which listen all queues at once and increase its concurrency?

Update : celery.py Config

from celery import Celery

# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'MyProject.settings')

from django.conf import settings  # noqa
from chatterbox import celery_settings

app = Celery('MyProject')

# Using a string here means the worker will not have to
# pickle the object when using Windows.
app.config_from_object('django.conf:settings')

app.conf.update(
    CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend',
    CELERYD_CONCURRENCY=1,
)

app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

Is uwsgi starting up worker processes because there are mutliple uwsgi processes? (Should be able to limit uwsgi to 1 worker to test this) — Jesse

Krzysztof Szularz Krzysztof Szularz · Accepted Answer · 2017-03-14T09:41:10

There is no simple answer to this.

To me, the fact that uwsgi spawns celery workers is wrong.

Creating only worker processes that consume all queues might lead to the situation where long running tasks make some queues overflow whereas separate workers that consume specific queues with short running tasks could make the situation better. Everything depends on your use case.

The 300mb residual memory is quite a lot. If the tasks are i/o bound go multi-thread/gevent. However, if the tasks are CPU bound, you have no other option than to scale process wise.

Correct Way for Celery Process Architecture and demonizing

2 Answers