5
votes

I have 6 subdags. Each of them contains a task with pool='crawler' that requires a lot of resources so I created a pool crawler with only 1 slot.

When I run the DAG it seems that the pool restriction is bypassed and all six tasks are executed at the same time (as you can see from the screenshot).

How can I force used slots to be <= available slots?

airflow pools

1
Looks like an Airflow bug and I wonder if it's related to you using subdags. Just to clarify, you only set the pool on a single task within the subdag right? Meaning the actual SubDagOperator in the "parent" DAG itself does not have a pool limit? - Daniel Huang
Yes right @DanielHuang, the SubDagOperator doesn't have a pool. Only one of its "child task" has pool limit - vinsce
Got it. Subdags are implemented by having the SubDagOperator kickoff the child DAG as a backfill job. I believe there are known issues with backfills adhering to concurrency and pool restrictions. A workaround could be to give the SubDagOperator the pool parameter instead, but it's not ideal since you lose some granularity (other tasks in the subdag are now stuck waiting). Alternatively, you can stop using subdag and move the shared code into a helper instead. - Daniel Huang

1 Answers

6
votes

From the source code:

Airflow pool is not honored by SubDagOperator. Hence resources could be consumed by SubdagOperators