1
votes

Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

How do I change the python version in my worker? (I'm using Spark in Standalone Mode)

3

3 Answers

0
votes

Install correct python version (Python3) on the worker node, and on the worker add python3 to path and then set PYSPARK_PYTHON environment variable as "python3", now check if pyspark is running python2 or 3 by running "pyspark" on terminal. This will open up a python shell. Notice the python version on the top of the python shell.

0
votes

If the corresponding variables were set, but the error is still there, then edit or create ./conf/spark-defaults.conf files (copy spark-defaults.conf.template) in both master and worker and add the following line there:

spark.pyspark.python /usr/bin/python3

Then restart master and worker.

0
votes

It must be because of your system python is pointing to 3.5. You should Ideally set your PATH variable before running script to point to PySpark Python like PATH=your anaconda or cloudera bin path:$PATH and everything will be automatically in Sync. Dont use system python for executing pyspark job as inconsistency may come up in driver and executor.