10
votes

Many of the airflow example dags that have schedule_interval=None set a dynamic start date like airflow.utils.dates.days_ago(2) or datetime.utcnow(). However, the docs recommend against a dynamic start date:

We recommend against using dynamic values as start_date, especially datetime.now() as it can be quite confusing. The task is triggered once the period closes, and in theory an @hourly DAG would never get to an hour after now as now() moves along.

Is start date irrelevant for manually triggered dags? What is the best practice here?

3
I'm not sure if I'm clear on what the problem you're encountering is. Can you add more context to your question around what you're trying to achieve, like specifically if a dynamic start_date is not working for you? The approach you described seems fine to me as start_date isn't too important for a DAG that's only externally triggered. This is a good question because I don't think the current docs make this use case explicitly clear. - Taylor Edmiston
@TaylorEdmiston no observable problem, just a conflict between docs and examples that make me feel unsure as the user. The tutorial talks about start_date a lot so I wasn't confident that it was really irrelevant. - rcorre

3 Answers

7
votes

I always try to set the start date for manually triggered DAGS as the day I first ran it so that I know when the DAG would have first been run for reference in the future.

0
votes

If you have a schedule_interval=None I believe the start_date is irrelevant as airflow will not attempt to perform any back filling. Just set it to anything even if it's a dynamic one it shouldn't cause any hassle.

0
votes

I ended up just setting start_date to 1970, Jan 1st (absurdly far in the past) so that Airflow never complains that the execution date is before the start date.