I have an Airflow environment running on Cloud Composer (3 n1-standard-1
nodes; image version: composer-1.4.0-airflow-1.10.0
; config override: core catchup_by_default=False
; PyPI packages: kubernetes==8.0.1
).
During a DAG run, a few tasks (all GKEPodOperators) failed due to airflow worker pod eviction. All of these tasks were set to retries=0
. One of them was requeued and retried. Why would this happen when the task is set to 0 retries? And why would it only happen to one of the tasks?