1
votes

I'm new to Airflow.

My goal is to run a dag, on a daily basis, starting 1 hour from now.

I'm truly misunderstanding the airflow schedule "end-of-interval invoke" rules.

From the docs [(Airflow Docs)][1]

Note that if you run a DAG on a schedule_interval of one day, the run stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In other words, the job instance is started once the period it covers has ended.

I set schedule_interval as followed:

schedule_interval="00 15 * * *"

and start_date as followed: start_date=datetime(year=2019, month=8, day=7)

My assumption was, that if now it's 14:00:00 PM (UTC time) and the date today is 07-08-2019, then my dag will be executed exactly in one hour. However, my dag is not starting at all.

2

2 Answers

3
votes

So there is a whole page talking about airflow job not been scheduled. https://airflow.apache.org/faq.html

The key thing to notice here is:

The Airflow scheduler triggers the task soon after the start_date + scheduler_interval is passed.

To my understanding, you want to trigger a task start_date=datetime(year=2019, month=8, day=7) at 15:00 UTC daily. schedule_interval="00 15 * * *" means you would run the task every day at 15:00 UTC. According to the docs, The scheduler triggers your task after start_date + scheduler_interval, so airflow won't trigger it until the next day which is August 8th 2019 15:00:00 UTC. Or you can change the day to 6th. It might be easier to understand this from ETL way: you can only process the data for a given period after it has passed. So August 7th 2019 15:00:00 UTC is your start point, you need to wait until August 8th 2019 15:00:00 UTC to run the task within that given period.

Also, note airflow has execution_data and start_date, you can find more here

2
votes

schedule_interval="00 15 * * *" start_date=07-08-2019

1st run will be on 08-08-2019 at 3:00 if you created this dag before 3:00 on 7-8-2019