0
votes

I am using Hadoop and Spark in multi node environment. I have installed Airflow to automate multiple spark tasks. To Run this DAGs in multi node, whether Celery executor or Kubernetes executor is the best option in Airflow?

1
All things equal, I'd recommend KubernetesExecutor, and would recommend this blog post as reading to help justify why: medium.com/bluecore-engineering/… However, it may make sense to use CeleryExecutor depending on your deployment environment. - chris.mclennon

1 Answers

0
votes

CeleryExecutor is built for horizontal scaling. Scheduler adds a message to the queue and the Celery broker delivers it to a Celery worker. We have fixed resources to run Celery Worker, if there are many task processing at the same time we definitely have issue with resource. And at the time no task is processing we wash money at that time.

With KubernetesExecutor, for each and every task that needs to run, the Executor talks to the Kubernetes API to dynamically launch an additional Pod. By utilize Kubernetes, you can scale up or scale down to save resources and save money