0
votes

I've simple DAG: (Airflow v1.10.16, using SequentialExecutor on localhost machine)

  • start_date set in past
  • catchup = False
default_args = {'owner': 'test_user', 
                'start_date': datetime(2019, 12, 1, 1, 00, 00),}

graph1 = DAG(dag_id = 'test_dag', default_args=default_args,
             schedule_interval=timedelta(days=1),
             catchup = False)

t = PythonOperator(task_id='t', python_callable=my_func, dag=graph1)

as per code comments

:param catchup: Perform scheduler catchup (or only run latest)?

I expected when the scheduler comes up, it's supposed to schedule this dag run only once in past date than now. However, the behavior i'm experiencing is: the scheduler is scheduling recent two runs (instead just one, the latest one)

I activated the scheduler on 2019-12-09 04:03:00Z (= now) and here's Task Instances scheduled runs: enter image description here

Can someone clarify why 2 runs in past date were scheduled instead just one? is it some bug or something wrong in my understanding?

1

1 Answers

0
votes

This is a bug in Airflow when using timedelta as Schedule Interval for version pre-1.10.11. It works as intended when using cron. It is fixed now in 1.10.11.

https://github.com/apache/airflow/pull/8776

https://issues.apache.org/jira/browse/AIRFLOW-3369