I am trying to run a python module with PySpark on yarn-client mode. The default python on my cluster is 2.6.6 and i would like to use python 3 which is installed under the path $/apps/anaconda/4.3.1/3/bin/python3.6
on my cluster. When i run the Spark module via the below spark2-submit it fails stating the wrong version of Python is being used. When i run the same code in cluster model with yarn-cluster, it succeeds. My question: how do i fix this? How can i make it work with yarn-client mode with python 3.6?
spark2-submit --master yarn --deploy-mode client --conf 'spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/apps/anaconda/4.3.1/3/bin/python3.6' --conf 'spark.yarn.appMasterEnv.PYSPARK_PYTHON=/apps/anaconda/4.3.1/3/bin/python3.6' --queue=queue-name --py-files custom-python-code.zip file.py
spark-submit
is using? - Ricardo M S