2
votes

I am using Celery to distribute tasks to multiple servers. For some reason, adding 7,000 tasks to the queue is incredibly slow and appears to be CPU bound. It takes 12 seconds to execute the code below, which is just adding tasks to the queue.

start = time.time()
for url in urls:
    fetch_url.apply_async((url.strip(),), queue='fetch_url_queue')
print time.time() - start

Switching between brokers (have tried redis, RabbitMQ, and pyamqp) does not have any significant effect.

Reducing the number of workers (which are each running on their own server, separate from the master server which adds the tasks) does not have any significant effect.

The URLs being passed are very small, each just about 80 characters.

The latency between any two given servers in my configuration is sub-millisecond (<1ms).

I must be doing something wrong. Surely Celery must be able to add 7,000 tasks to the queue in less time than several seconds.

1
How long would you expect it to take to add 7,000 of anything? It seems that it would be unreasonable to expect that to be instantaneous. - theMayer
I was by no means expecting instantaneity, but given the small amount of data being passed with each task (an 80 character URL), I was expecting something on the order of 1 second. - monstermac77
How are your queues configured? Are they set up as persistent? - theMayer
No, to try to boost performance I made my queue transient (set durable=False and delivery_mode=1). - monstermac77

1 Answers

3
votes

The rate at which tasks can be queued depends on celery broker you are using and your server cpu.

With AMD A4-5000 CPU & 4GB ram, here are task rates for various brokers

# memory -> 400 tasks/sec
# rabbitmq -> 220 tasks/sec
# postgres -> 30 tasks/sec

With Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz & 4GB ram

# memory -> 2000 tasks/sec
# rabbitmq -> 1500 tasks/sec
# postgres -> 200 tasks/sec