1
votes

I am trying to initialize Spark Context variable in Python.

from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("test").setMaster("local")
sc = SparkContext(conf=conf)

But I am getting following error:

py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.internal.config.package$
at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:546) at org.apache.spark.SparkContext.(SparkContext.scala:373) at org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:236) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:748)

I have looked around for solution but not did not get exact solution. Please help.

3

3 Answers

5
votes

Setting SPARK_LOCAL_IP environment variable to localhost solved my error.

0
votes

Please try mentioning master as "local[*]" instead of just "local".

0
votes

Do you have this set?

export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH

Also just to be sure, add the path to the py4j zip (mine is py4j-0.10.1-src.zip) in the spark directory as:

export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.1-src.zip:$PYTHONPATH