0
votes

I have followed this simple tutorial https://ysinjab.com/2015/03/28/hello-spark/ but I am trying to do it on Windows. When I finally run the code

file = sc.textFile("C:\war_and_peace.txt")
warsCount = file.filter(lambda line:"war" in line)
peaceCount = file.filter(lambda line:"peace" in line)
warsCount.count()

I get an error about File "C:\Spark\python\lib\pyspark.zip\pyspark\worker.py", line 64, in main Exception: Python in worker has different version 3.4 than that in driver 3.5, PySpark cannot run with different minor versions

I tried editing my config file in C:\Spark\conf\spark-spek-env.sh by adding

PYSPARK_PYTHON=python3
PYSPARK_DRIVER_PYTHON=ipython C:\Spark\bin

But this did not improve things. Does anyone have a solution?

1

1 Answers

0
votes

Try with absolute path to python executable. I encountered this problem a lot with clusters. If you work in standalone mode, try using a virtualenv, anaconda or something similar.

Do you have the problem when executing a spark example?

./bin/spark-submit examples/src/main/python/pi.py