5
votes

I have an Airflow environment running on Cloud Composer (3 n1-standard-1 nodes; image version: composer-1.4.0-airflow-1.10.0; config override: core catchup_by_default=False; PyPI packages: kubernetes==8.0.1).

During a DAG run, a few tasks (all GKEPodOperators) failed due to airflow worker pod eviction. All of these tasks were set to retries=0. One of them was requeued and retried. Why would this happen when the task is set to 0 retries? And why would it only happen to one of the tasks?

1
Please add your DAG. Can't help with looking at the code.kaxil

1 Answers

0
votes

"airflow worker pod eviction" means that some pods needed more resources hence some pods were evicted.

To fix this you can use larger machine types or try to reduce the DAGs memory consumption.

Review his document to have a better view.