0
votes

define the instance for processing the training data

dag = DAG(
    dag_id,
    start_date = datetime(2019, 11, 14),
    description = 'Reading training logs from the corresponding location',
    default_args = default_args,
    schedule_interval = timedelta(hours=1),
)

I have the code like this. So in my opinion, this dag will execute every one hour. But in the airflow web, I got many run days in Schedule part. The day is executing all the time. Especially, in the Tree View part, I could see all the block were filled within one hour!!! I am confused about the schedule_interval function. Any ideas on how to fix that .

2

2 Answers

1
votes

On the FIRST DAG run, it will start on the date you define on start_date. From that point on, the scheduler creates new DagRuns based on your schedule_interval and the corresponding task instances run as your dependencies are met. You can read more about it here .

0
votes

I know, it is the problem coming from the non consistent time setting between the really time and start_date. It the start_date is behind the really time, the system will backfill the past time.