I'm here with a performance issue that I can't seem to figure it out.
The problem is that executing tasks is too slow. Based on the celery log most of the tasks are finished under 0.3 seconds.
I noticed that if I stop the workers and start them again the performance increases, almost up to 200 acks / second, then, after a while it becomes much slower, around 40/s.
I'm not sure but I think it might be a broker issue rather than a celery issue. Looking at the log of a couple of workers I noticed that they all seem to execute tasks, then stop for a bit and start again.
It feels like receiving tasks is slow.
Any ideas about what might cause this ? Thanks !
A log example:
Task drones.tasks.blue_drone_process_task[64c0a826-aa18-4226-8a39-3a455e0916a5] succeeded in 0.18421914400005335s: None
10 seconds break
Received task: drones.tasks.blue_drone_process_task[924a1b99-670d-492e-94a1-91d5ff7142b9]
Received task: drones.tasks.blue_drone_process_task[74a9a1d3-aa2b-40eb-9e5a-1420ea8b13d1]
Received task: drones.tasks.blue_drone_process_task[99ae3ca1-dfa6-4854-a624-735fe0447abb]
Received task: drones.tasks.blue_drone_process_task[dfbc0d65-c189-4cfc-b6f9-f363f25f2713]
IMO those tasks should execute so fast that I shouldn't be able to read the log.
My setup is:
- celery 4.2.1
- RabbitMQ 3.7.8
- Erlang 21.1
I use this setup for web scraping, have 2 queue. Let's call them Requests and Process.
In the Requests queue I URLs that need to be scraped and in the Process queue will find the URL + source code of that page. (max 2.5 MB / source page, I drop it in case it's bigger than that), so all messages in the Process queue are max 2.5MB ± 1KB.
To execute tasks from the Requests queue I use celery with the gevent pool, concurrency 300. (-P gevent -c 300 --without-gossip --without-mingle --without-heartbeat). 4-8 workers like this.
To execute tasks from the Process queue I use the prefork pool (default). (-c 4 --without-gossip --without-mingle --without-heartbeat). 30 workers like this.
Other setup info:
- disabled heartbeats in celery and RabbitMQ, use TCP keep-alive
- everything is in AWS
- c4.xlarge instances for workers
- i3.xlarge for RabbitMQ (30GB RAM, 765 NVMe SSD, 4 cores)
- haproxy for load balancing (I had 2 x RabbitMQ clustered for HA, fully replicated, stopped one thinking that might cause the issue, but I left the load balancer in case I decide to recreate the cluster)
RabbitMQ config:
- hearbeat = 0
- lazy_queue_explicit_gc_run_operation_threshold = 500
- proxy-protocol = true
- vm_memory_high_watermark = 0.6
- vm_memory_high_watermark_paging_ratio = 0.1
- queue_index_embed_msgs_below = 4096
Celery config:
- CELERY_TASK_ACKS_LATE = false (tried both ways)
- CELERY_RESULT_BACKEND = None
- CELERY_WORKER_ENABLE_REMOTE_CONTROL = True
- BROKER_HEARTBEAT = 0
- CELERY_CONTROL_QUEUE_EXPIRES = 60
- CELERY_BROKER_CONNECTION_TIMEOUT = 30
- CELERY_WORKER_PREFETCH_MULTIPLIER = 1
- workers running with Ofair
- max-tasks-per-child = 10 (tried without it as well)
Tried using a higher prefetch, like 5, 10 and 20 and it did not work.
