6
votes

I've got a DAG that's scheduled to run daily. In most scenarios, the scheduler would trigger this job as soon as the execution_date is complete, i.e., the next day. However, due to upstream delays, I only want to kick off the dag run for the execution_date three days after execution_date. In other words, I want to introduce a three day lag.

From the research I've done, one route would be to add a TimeDeltaSensor at the beginning of my dag run with delta=datetime.timedelta(days=3).

However, due to the way the Airflow scheduler is implemented, that's problematic. Under this approach, each of my DAG runs will be active for over three days. My DAG has lots of tasks, and if several DAG runs are active, I've noticed that the scheduler eats up lots of CPU because it's constantly iteration over all these tasks (even inactive tasks). So is there another way to just tell the scheduler to not kick off the DAG run until three days have passed?

2

2 Answers

3
votes

It might be easier to manipulate the date variable within the DAG.

I am assuming you would be using the execution date ds in your task instances in some way, like querying data for the given day.

In this case you could use the built in macros to manipulate the date like macros.ds_add(ds, -3) to simply adjust the date to minus 3 days.

You can use it in a template field as usual '{{ macros.ds_add(ds, -3) }}'

Macro docs here

0
votes

One possible solution could be to have max_active_runs set to 1 for the DAG. While this does not prevent the DAG from being active for 3 days it would prevent multiple DAG runs from being initiated.