I am currently working with Airflow and Celery for processing files. A worker needs to download files, process them and re-upload them after. My DAGs are fine with only one worker. But when I add one things get complicated.
Workers takes tasks as they are available. Worker1 can take the task "processing downloaded files" but that was Worker2 that took the task "downloading files", so the task failed, because it can't process files that don't exist.
Is there a way to specify to the workers (or the scheduler) that a DAG must be run only on one worker? I know about queue. But I am already using them.
task
. BTW whats the problem withqueue
s? – y2k-shubham