3
votes

I have tried viewing similar answers on stackoverflow to this problem, however my case is slightly different.

I am executing backfill jobs via Airflow CLI, and the backfilled dag runs get stuck in a running state, with the first task in the dag in a queued (grey) state.

The scheduler doesn't seem to ever kick off the first task.

I do not have depends_on_past=True set as dag_defaults

dag_defaults = {
    "start_date": datetime.today() - timedelta(days=2),
    "on_failure_callback": on_failure_callback,
    "provide_context": True
}

I am forced to Run every task manually. :( Rather than just letting the scheduler take its course and run them automatically.

Note: I am executing the backfill cli commands via Airflow worker pods on a K8S cluster.

Has anyone else faced a similar issue using the backfill cli commands?

UPDATE: I realised my backfill runs fall outside the total dag interval. I.e before the dag start_date causing a blocking schedule dependancy.

Task Instance Details

While you can still create the run, it will not run automatically, but you can manually run each task.

As a workaround would need to change the start_date to be before or on my oldest backfill date.

Would be nice if there was a way to override the backfill cmd or provide a --force option that could mock the start_date in for that specific dag_run, rather than being bound to the total interval.

1
Can you click on your Task, hit Task Instance Details and check the Dependency Reason sections? Should lead you in the right direction - Nadim Younes
@NadimYounes, thanks that helped validate my issue. The backfill jobs I am running fall outside the dag start_date (interval), it seems that backfill will only auto schedule if it is within the interval, but allow you to manually run if outside that interval. Airflow docs: The backfill command will re-run all the instances of the dag_id for all the intervals within the start date and end date. - Rogan88
Glad it helped, good luck @Rogan88 - Nadim Younes
Thanks, I have Updated the question. - Rogan88

1 Answers

1
votes

UPDATE: I realised my backfill runs fall outside the total dag interval. I.e before the dag start_date causing a blocking schedule dependancy.

While you can still create the run, it will not run automatically, but you can manually run each task.

As a workaround would need to change the start_date to be before or on my oldest backfill date.

Would be nice if there was a way to override the backfill cmd or provide a --force option that could mock the start_date in for that specific dag_run, rather than being bound to the total interval.