Well, they are, I would say, totally different concepts, and they can be used independently. It is true that they could both be used to prevent backfilling, but if that's your only concern then just use catchup=False
. Quoting from this reply by one of the Airflow developers, in fact, it seems clear that the good practice is to use that:
As the author of LatestOnlyOperator, the goal was as a stopgap until
catchup=False landed.
But he then goes on saying that LatestOnlyOperator
should be deprecated. I don't agree (as a user of both catchup=False
and LatestOnlyOperator
) and I'll try to explain. My intuition of these two concepts is this:
Catchup = True
In a DAG definition (i.e. when you specify its default_args
) you can set the flag catchup
to True
. If you set this flag to True
and you set the DAG to ON, then the scheduler will create DAG runs for each schedule interval from the start_date
to the "present" and will execute them sequentially. Quoting the documentation:
If the dag.catchup
value had been True
instead, the scheduler would have created a DAG Run for each completed interval between 2015-12-01 and 2016-01-02 (but not yet one for 2016-01-02, as that interval hasn’t completed) and the scheduler will execute them sequentially.
LatestOnlyOperator
A LatestOnlyOperator
is an extention of the BaseOperator
. Tasks made with this Operator will not run (i.e. will be skipped, and will skip also the downstream ones) if the DAG run is not in the latest schedule interval (i.e. the "last run"). Also quoting from the LatestOnlyOperator
docstring:
"""
Allows a workflow to skip tasks that are not running during the most
recent schedule interval.
If the task is run outside of the latest schedule interval, all
directly downstream tasks will be skipped.
Note that downstream tasks are never skipped if the given DAG_Run is
marked as externally triggered.
"""
Conclusion
You can define your scheduled DAG with catchup=True
and use LatestOnlyOperator
to make sure that some tasks will not be executed during the catchup runs. Moreover LatestOnlyOperator
can be used if you want to re-run some past DAG runs (for example by clearing them in the UI) but you have some tasks (like notifications being sent) that you would want to skip during those re-runs.