I'm running spark in standalone mode, in Windows 8 using anaconda 3.5, ipython notebook.
The specification, I'm trying to create the environment is the following:
import os
import sys
import numpy
spark_path = "D:\spark"
os.environ['SPARK_HOME'] = spark_path
os.environ['HADOOP_HOME'] = spark_path
sys.path.append(spark_path + "/bin")
sys.path.append(spark_path + "/python")
sys.path.append(spark_path + "/python/pyspark/")
sys.path.append(spark_path + "/python/lib")
sys.path.append(spark_path + "/python/lib/pyspark.zip")
sys.path.append(spark_path + "/python/lib/py4j-0.10.4-src.zip")
from pyspark import SparkContext
from pyspark import SparkConf
sc = SparkContext("local", "test")
When I'm trying to run the following code:
rdd = sc.parallelize([1,2,3])
rdd.count()
it's giving me error:
Python in worker has different version 3.4 than that in driver 3.5, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
I tried this
import os
os.environ["SPARK_HOME"] = "/usr/local/Cellar/apache-spark/2.1.0/" ## Exact anaconda path in "program files"
And I tried this
But both could not solve my problem. Can someone please help me to resolve the issue? I'm bit non-technical in terms of computer system configuration.
Thanks a lot!
os.environ['PYSPARK_PYTHON'] = '/wheverer/is/your/anaconda/python3.5'(yes, the full path to the same executable you are using to run that script... which is not the defaultpythonin your PATH, clearly) - Samson Scharfrichter$SPARK_HOME/conf/spark-env.shwith anexport PYSPARK_PYTHON=/wheverer/is/your/anaconda/python3.5- Samson Scharfrichter