0
votes

I am very new to airflow so please excuse the noobie question.

Like the title says I have an airflow dag that has been marked as failed but still gets ran by airflow scheduler. I see it in my logs. I can kill a process but the scheduler keeps respawning it in another process. How can I stop this?

1
Can you post the default_args for you DAG? What you're trying to achieve with it? - raphael
@raphael I mostly followed the sample for default_args default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': days_ago(1), 'email': ['[email protected]'], 'email_on_failure': True, 'email_on_retry': True, 'retries': 2, 'retry_delay': timedelta(minutes=10) } dag = DAG('etl_1', default_args=default_args, schedule_interval='15 11 * * *', ) - BugCatcherJoe
Can you edit that into your question so it's easier to read? According to crontab.guru/#15_11___* your task is scheduled every day at 11:15, are you finding tasks spawning more regularly than that? - raphael
I was in the initial testing phase of my dag so I mostly ran off of manually triggered runs. One odd thing I may have done is that in my PythonOperators I used both op_args and op_kwargs which I haven't seen many examples of anywhere online. - BugCatcherJoe
how were you killing the process? I will also note that you have retries: 2 which would imply Airflow would try the same task again if it were to fail. - raphael

1 Answers

1
votes

When you kill a process you are only killing a scheduled instance of a task/dag, the scheduler will continue to create new instances based on the schedule you have provided.

For example, this example dag has schedule_interval=None.

You may have also set Catchup=True and set your start_date in the past, so the scheduler is backfilling all the past dates.