I'm running Apache Airflow 1.10.0 and I want to take advantage of the new timezone aware Dag feature. I must admit that the Airflow scheduler is a bit confusing and I'm not quite sure how to accomplish what I'm trying to do. I am trying to define a Dag that will run at 5 past midnight (Eastern time) every day.
So far I've tried defining the Dag with a timezone aware start_date using Pendulum. My schedule interval is timedelta(days=1). For some reason this has resulted in runs at seemingly odd times 12:00, etc.
My current Dag definition:
...
dag_tz = pendulum.timezone('US/Eastern')
default_args = {
'owner': 'airflow',
'email': '<email_address>',
'email_on_failure': True,
'email_on_retry': True,
'retries': 3,
'depends_on_past': False,
'retry_delay': timedelta(minutes=5),
'provide_context': True,
'start_date': datetime(2019, 5, 1, tzinfo=dag_tz)
}
dag = DAG('my_dag_id', default_args=default_args,
catchup=False, schedule_interval=timedelta(days=1))
...
What I'd like is for the Dag to run at the same time each day. I've seen that I can use a cron expression for schedule_interval but that's confusing as well because I'm not sure if I need to include my UTC offset in the cron expression or if the fact that the Dag is timzeone aware will take care of this.
For example, should my schedule_interval be 05 04 * * * or 05 00 * * * or something else entirely?