0
votes

We are using Airflow to schedule our jobs on EMR and currently we want to use apache Livy to submit Spark jobs via Airflow I need more guidance on below : Which Airflow-Livy operator we should use for python 3+ pyspark and scala jobs. I have seen below : https://github.com/rssanders3/airflow-spark-operator-plugin and https://github.com/panovvv/airflow-livy-operators

Wants to know more about stable AirflowLivy operator anyone using in production probably in AWS stack.

Also Step by step installation guide for integration.

1

1 Answers

2
votes

I would recommend using LivyOperator from https://github.com/apache/airflow/blob/master/airflow/providers/apache/livy/operators/livy.py

Currently, it is only available in Master but you could copy-paste the code and use that as a Custom Operator till we backport all the new operators for Airflow 1.10.* series