3
votes

I am trying to run a python module with PySpark on yarn-client mode. The default python on my cluster is 2.6.6 and i would like to use python 3 which is installed under the path $/apps/anaconda/4.3.1/3/bin/python3.6 on my cluster. When i run the Spark module via the below spark2-submit it fails stating the wrong version of Python is being used. When i run the same code in cluster model with yarn-cluster, it succeeds. My question: how do i fix this? How can i make it work with yarn-client mode with python 3.6?

spark2-submit --master yarn --deploy-mode client --conf 'spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=/apps/anaconda/4.3.1/3/bin/python3.6' --conf 'spark.yarn.appMasterEnv.PYSPARK_PYTHON=/apps/anaconda/4.3.1/3/bin/python3.6'  --queue=queue-name --py-files custom-python-code.zip file.py
1
how do you check which version of python your spark-submit is using? - Ricardo M S

1 Answers

2
votes

Instead of

spark.yarn.appMasterEnv.PYSPARK_PYTHON=/apps/anaconda/4.3.1/3/bin/python3.6

use

spark.executorEnv.PYSPARK_PYTHON=/apps/anaconda/4.3.1/3/bin/python3.6

The former one is applicable only in the cluster mode.