With the advent of TaskGroups in Airflow 2.x, it's worth expanding on a previous answer. TaskGroups are just UI groupings for tasks, but they also serve as handy logical groupings for a bunch of related tasks. The tasks in a TaskGroup can be bundled and abstracted away to make it easier to build a DAG out of larger pieces. That being said, it may still be useful to have a file full of related tasks without bundling them into a TaskGroup.
The trick to breaking up DAGs is to have the DAG in one file, for example my_dag.py, and the logical chunks of tasks or TaskGroups in separate files, with one logical task chunk or TaskGroup per file. Each file contains functions (or methods if you want to take an OO approach) each of which returns an operator instance or a TaskGroup instance.
To illustrate, my_dag.py (below) imports operator-returning functions from foo_bar_tasks.py, and it imports a TaskGroup-returning function from xyzzy_taskgroup.py. Within the DAG context, those functions are called and their return values are assigned to task or TaskGroup variables, which can be assigned up-/downstream dependencies.
dags/my_dag.py:
# std lib imports
from airflow import DAG
# other airflow imports
from includes.foo_bar_tasks import build_foo_task, build_bar_task
from includes.xyzzy_taskgroup import build_xyzzy_taskgroup
with DAG(dag_id="my_dag", ...) as dag:
# logical chunk of tasks
foo_task = build_foo_task(dag=dag, ...)
bar_task = build_bar_task(dag=dag, ...)
# taskgroup
xyzzy_taskgroup = build_xyzzy_taskgroup(dag=dag, ...)
foo_task >> bar_task >> xyzzy_taskgroup
plugins/includes/foo_bar_tasks.py:
# std lib imports
from airflow import DAG
from airflow.operators.foo import FooOperator
from airflow.operators.bar import BarOperator
# other airflow imports
def build_foo_task(dag: DAG, ...) -> FooOperator:
# ... logic here ...
foo_task = FooOperator(..., dag=dag)
return foo_task
def build_bar_task(dag: DAG, ...) -> BarOperator:
# ... logic here ...
bar_task = BarOperator(..., dag=dag)
return bar_task
plugins/includes/xyzzy_taskgroup.py:
# std lib imports
from airflow import DAG
from airflow.operators.baz import BazOperator
from airflow.operators.qux import QuxOperator
from airflow.utils import TaskGroup
# other airflow imports
def build_xyzzy_taskgroup(dag: DAG, ...) -> TaskGroup:
xyzzy_taskgroup = TaskGroup(group_id="xyzzy_taskgroup")
# ... logic here ...
baz_task = BazOperator(task_id="baz_task", task_group=xyzzy_taskgroup, ...)
# ... logic here ...
qux_task = QuxOperator(task_id="qux_task", task_group=xyzzy_taskgroup, ...)
baz_task >> qux_task
return xyzzy_taskgroup