1
votes

I'm here with a performance issue that I can't seem to figure it out.

The problem is that executing tasks is too slow. Based on the celery log most of the tasks are finished under 0.3 seconds.

I noticed that if I stop the workers and start them again the performance increases, almost up to 200 acks / second, then, after a while it becomes much slower, around 40/s.

I'm not sure but I think it might be a broker issue rather than a celery issue. Looking at the log of a couple of workers I noticed that they all seem to execute tasks, then stop for a bit and start again.

It feels like receiving tasks is slow.

Any ideas about what might cause this ? Thanks !


A log example:

Task drones.tasks.blue_drone_process_task[64c0a826-aa18-4226-8a39-3a455e0916a5] succeeded in 0.18421914400005335s: None

10 seconds break

Received task: drones.tasks.blue_drone_process_task[924a1b99-670d-492e-94a1-91d5ff7142b9] Received task: drones.tasks.blue_drone_process_task[74a9a1d3-aa2b-40eb-9e5a-1420ea8b13d1] Received task: drones.tasks.blue_drone_process_task[99ae3ca1-dfa6-4854-a624-735fe0447abb] Received task: drones.tasks.blue_drone_process_task[dfbc0d65-c189-4cfc-b6f9-f363f25f2713]

IMO those tasks should execute so fast that I shouldn't be able to read the log.


My setup is:

  • celery 4.2.1
  • RabbitMQ 3.7.8
  • Erlang 21.1

I use this setup for web scraping, have 2 queue. Let's call them Requests and Process.

In the Requests queue I URLs that need to be scraped and in the Process queue will find the URL + source code of that page. (max 2.5 MB / source page, I drop it in case it's bigger than that), so all messages in the Process queue are max 2.5MB ± 1KB.

To execute tasks from the Requests queue I use celery with the gevent pool, concurrency 300. (-P gevent -c 300 --without-gossip --without-mingle --without-heartbeat). 4-8 workers like this.

To execute tasks from the Process queue I use the prefork pool (default). (-c 4 --without-gossip --without-mingle --without-heartbeat). 30 workers like this.


Other setup info:

  • disabled heartbeats in celery and RabbitMQ, use TCP keep-alive
  • everything is in AWS
  • c4.xlarge instances for workers
  • i3.xlarge for RabbitMQ (30GB RAM, 765 NVMe SSD, 4 cores)
  • haproxy for load balancing (I had 2 x RabbitMQ clustered for HA, fully replicated, stopped one thinking that might cause the issue, but I left the load balancer in case I decide to recreate the cluster)

RabbitMQ config:

  • hearbeat = 0
  • lazy_queue_explicit_gc_run_operation_threshold = 500
  • proxy-protocol = true
  • vm_memory_high_watermark = 0.6
  • vm_memory_high_watermark_paging_ratio = 0.1
  • queue_index_embed_msgs_below = 4096

Celery config:

  • CELERY_TASK_ACKS_LATE = false (tried both ways)
  • CELERY_RESULT_BACKEND = None
  • CELERY_WORKER_ENABLE_REMOTE_CONTROL = True
  • BROKER_HEARTBEAT = 0
  • CELERY_CONTROL_QUEUE_EXPIRES = 60
  • CELERY_BROKER_CONNECTION_TIMEOUT = 30
  • CELERY_WORKER_PREFETCH_MULTIPLIER = 1
  • workers running with Ofair
  • max-tasks-per-child = 10 (tried without it as well)

Tried using a higher prefetch, like 5, 10 and 20 and it did not work.

In case this helps

1

1 Answers

0
votes

Managed to figure it out. It was a networking issue. The EC2 instance that I used for the load balancer had a low networking performance. I picked up a new instance type with a better networking performance and it works amazingly fast.