3
votes

We've been converting our cron jobs over to Airflow DAGs and I am having difficulties figuring out exactly how the scheduling of DAGs works in Airflow. Some DAGs need to run at specific times of the day (ie 7am), other DAGs need to run at a specific day/time of the month (ie 6am on the 15th of every month).

Generally, Airflow seems to be running daily DAGs correctly. So, schedule_interval = '0 7 * * * with 'start_date': datetime(2017,4,7) runs everyday at 7am.

However, for a monthly DAG (schedule_interval = '0 6 15 * *' and 'start_date': datetime(2017,4,7)) it ran on April 15 at 6am, but didn't hasn't run since then. Other DAGs I've tried to schedule monthly similarly fail to run after the first month.

Airflow's documentation on scheduling is, IMO, muddy and answers to other SO questions have made me more confused. I'm hoping someone out there can clarify what is going wrong with my understanding and the DAGs I'm trying to schedule monthly.

1
Were you looking at this DAG between 2017-05-16 and 2017-06-14 as your post time suggests? If so you're miss identifying the execution_date (Run in the UI) of 2017-04-15 as being the date when the DAG ran. Please look at the first task's start time (Start in the UI), It actually ran the 2017-04-15 run on 2017-05-15. This is consistent with the answer you got below. - dlamblin

1 Answers

10
votes

The Airflow monthly run scheduling, while consistent with its daily scheduling, is confusing.  As a result, a monthly DAG runs about a month later than you might expect.  For example, if I schedule a DAG to run on the first of the month at midnight (e.g. 0 0 1 * *), the run with execution_date 2018-04-01 will actually run just after 2018-05-01 at midnight.  This is because Airflow waits for the execution period to finish before running.  I think the idea is that the monthly execution of 2018-04-01 represents data for the entire period of 2018-04-01 to 2018-05-01.

You'll need to restructure your schedules with this concept in mind.