0
votes

Running spark 1.4.1 on CentOS 6.7. Have both python 2.7 and python 3.5.1 installed on it with anaconda.

MAde sure that PYSPARK_PYTHON env var is set to python3.5 but when I open pyspark shell and execute a simple rdd transformation, it errors out with below exception:

Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions

Just wondering what are the other places to change the path.

1

1 Answers

1
votes

Did you restart the Spark workers with the new setting? Changing the environment setting just for your driver process is not enough: tasks created by the driver will cross process, sometimes system, boundaries to be executed. Those tasks are compiled bits of code, so that is why both versions need to match.