2
votes

I am updating the scheduler of my DAG on running time with a logic like this:

now = time.localtime()
sched_interval = '30 6 * * *' if now.tm_isdst else '30 7 * * *'

dag = DAG(
    'my_dag',
    default_args=args,
    schedule_interval=sched_interval,
    max_active_runs=1,
    catchup=False)

The problem is: after DST, DAG will trigger twice since the scheduler will be updated for 1h more. How can I avoid running twice in this case? I am using AirFlow 1.9.

Thanks!

2

2 Answers

1
votes

The Airflow documentation says:

In case you set a cron schedule, Airflow assumes you will always want to run at the exact same time. It will then ignore day light savings time. Thus, if you have a schedule that says run at end of interval every day at 08:00 GMT+1 it will always run end of interval 08:00 GMT+1, regardless if day light savings time is in place.

This seems to imply to me that you don't need to test for DST as it will automatically convert.

1
votes

Airflow 1.9 does not provide a functionality to account for the daylight saving time. It knows nothing about time zones and runs everything in UTC±00:00.

As you found out, changing schedule interval trying to emulate this missing functionality is problematic, because

Changing schedule interval always requires changing the dag_id, because previously run TaskInstances will not align with the new schedule interval [1]

So, if possible, the best solution would be to upgrade to at least Airflow 1.10 that introduces timezone-aware DAGs. Then you can achieve what you want by setting the timezone of your DAG as needed and using a crone expression for schedule interval.