I am running a web application that runs celery on AWS. However, all worker processes are running in a private data center (campus supercomputer). I have 34 separate worker processes running to consume jobs, the rabbitmq and Redis instances used for the broker and backend exist on AWS in my EC2 instance.
I was shocked last month to find out that, with no jobs submitted to the application, I still used nearly 700GB of network bandwidth (outgoing traffic only!) on my EC2 instance hosting rabbit and Redis. This traffic is entirely caused by celery worker overhead communication with the rabbit instance. There are nearly 17 messages/second being sent to each worker instance despite no actual compute jobs to process.
My tasks are long-running (at least multi-second, and sometimes multi-minute), heavy compute jobs, so high latency for task retrieval is totally acceptable--timescales on seconds is fine. Ideally, I'd like to tell my celery workers to just check in for new tasks once every few seconds and stop all other network overhead communication.
Is there a way to reduce the overall network overhead for celery workers?