1
votes

I'm running Apache Airflow 1.10.0 and I want to take advantage of the new timezone aware Dag feature. I must admit that the Airflow scheduler is a bit confusing and I'm not quite sure how to accomplish what I'm trying to do. I am trying to define a Dag that will run at 5 past midnight (Eastern time) every day.

So far I've tried defining the Dag with a timezone aware start_date using Pendulum. My schedule interval is timedelta(days=1). For some reason this has resulted in runs at seemingly odd times 12:00, etc.

My current Dag definition:

...

dag_tz = pendulum.timezone('US/Eastern')

default_args = {
    'owner': 'airflow',
    'email': '<email_address>',
    'email_on_failure': True,
    'email_on_retry': True,
    'retries': 3,
    'depends_on_past': False,
    'retry_delay': timedelta(minutes=5),
    'provide_context': True,
    'start_date': datetime(2019, 5, 1, tzinfo=dag_tz)
}

dag = DAG('my_dag_id', default_args=default_args,
          catchup=False, schedule_interval=timedelta(days=1))

...

What I'd like is for the Dag to run at the same time each day. I've seen that I can use a cron expression for schedule_interval but that's confusing as well because I'm not sure if I need to include my UTC offset in the cron expression or if the fact that the Dag is timzeone aware will take care of this.

For example, should my schedule_interval be 05 04 * * * or 05 00 * * * or something else entirely?

2

2 Answers

0
votes

After some experimentation I have concluded that in order to get the dag to run at 5 past midnight every day I need to use a schedule interval of 05 00 * * * along with the timezone aware start date.

0
votes

You can also write it without 0-prefix. Like 5 0 * * *