1
votes

I've created DAG with start_date and schedule_interval like below:

default_args = {
    'start_date': datetime(2020, 11, 16, tzinfo=local_tz),
    'retries': 3,
    'retry_delay': timedelta(minutes=1),
    'execution_timeout': timedelta(seconds=3600),
}

dag = DAG(
    'batch_job',
    default_args=default_args,
    description='Batch job',
    schedule_interval=timedelta(days=1),
)

task_0= BashOperator(
    task_id='task_0',
    bash_command='cd %s && ./run.sh -p %s -e {{ next_ds }}' %(dir_path, phase),
    dag=dag,
)

task_0

I intended to run my task every midnight with backfilling tasks of past days. But when I toggle on the DAG in the Airflow web UI, task is scheduled and ran immediately. Also, scheduler doesn't backfill tasks at all.

In the web UI, it looks like task attributes has start_date value correctly as I intended, but task instance attributes ignores this and overrided it with toggled on time.

I'm currently use Airflow 1.10.12 with MySQL and RabbitMQ.

How can I solve this problem?

1

1 Answers

0
votes

I found the solution by myself.

Catchup (backfilling) was disabled as AIRFLOW__SCHEDULER__CATCHUP_BY_DEFAULT environment was set as False

I added catchup=True in the parameter of DAG instance, and it worked as I intended.

ref: https://issues.apache.org/jira/browse/AIRFLOW-1156