6
votes

I am a newbie in Apache Zeppelin and I try to run it locally. I try to run just a simple sanity check to see that sc exists and get the error below.

I compiled it for pyspark and spark 1.5 (I use spark 1.5). I increased the memory to 5 GB and changed the port to 8091.

I am not sure what I did wrong so I get the following error and how should I solve it.

Thanks in advance

java.lang.ClassNotFoundException: org.apache.spark.repl.SparkCommandLine at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:401) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.spark.PySparkInterpreter.getSparkInterpreter(PySparkInterpreter.java:485) at org.apache.zeppelin.spark.PySparkInterpreter.createGatewayServerAndStartScript(PySparkInterpreter.java:174) at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:152) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:302) at org.apache.zeppelin.scheduler.Job.run(Job.java:171) at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Update The solution for me was to degrade my scala version from 2.11.* to 2.10.*, build Apache Spark again and run Zeppelin.

2
You say "locally" but you don't say if you if your Spark config is local[] or laying Zeppelin over an existing cluster? Also you don't say what steps you have taken to troubleshoot this. What is the setting in your Interpreter menu for master? Post some of your zeppelin-env.sh or zeppelin-site.xml files? If sc is doing this I'd assume it's the underlying cluster config somehow, I assume pyspark works OK there? - JimLohse
Having installed Zeppelin last night, thanks for asking this, very cool product, just for a "double-sanity check" what happens when you run the underlying spark interactive shell for Scala that automatically creates a SparkContext? If you get this solved please post as an answer :) - JimLohse
Spark runs locally too. Scala as well does not run properly. Regarding the zeppelin-site.xml the only property I changed is the port to 8091. In zeppelin-env.sh I added two lines export ZEPPELIN_MEM=-Xmx5g and export SPARK_HOME=/opt/spark-1.5.2/. pyspark runs ok. - Tom Ron
recommend you put scala and python in the tags to attract a little wider crowd with more experience in Spark and Zeppelin, I have some ideas and will post an answer in an hour or two, busy right now. Just to be sure, you have the Master on the Zeppelin Interpreter menu set to local[]? or local[n] or local[*], not naming a specific server? And same ownership on /opt/spark-1.5.2/* as Zeppelin install? - JimLohse
local[*], same ownership for both. Thanks for you efforts! - Tom Ron

2 Answers

3
votes

I am making certain assumptions based on what you have answered in comments. It sounds like the Zeppelin setup is good, when I looked at the class SparkCommandLine it's part of Spark's core.

Now Zeppelin has its own minimal embedded Spark classes, which are activated if you don't set SPARK_HOME. So first, per this github page, try not setting SPARK_HOME (which you are setting) and HADOOP_HOME (which I don't think you are setting), to see if eliminating your underlying Spark install "fixes" it:

Without SPARK_HOME and HADOOP_HOME, Zeppelin uses embedded Spark and Hadoop binaries that you have specified with mvn build option. If you want to use system provided Spark and Hadoop, export SPARK_HOME and HADOOP_HOME in zeppelin-env.sh You can use any supported version of spark without rebuilding Zeppelin.

If that works, then you know we are looking at a Java classpath issue. To try to fix this, there's one more setting that goes in the zeppelin-env.sh file,

ZEPPELIN_JAVA_OPTS

mentioned here on the Zeppelin mailing list, make sure you set that to point to the actual Spark jars so the JVM picks it up with a -classpath

Here's what my zeppelin process looks like for comparison, I think the important part is the -cp argument, do the ps on your system and look through your JVM options to see if it's similarly pointing to

/usr/lib/jvm/java-8-oracle/bin/java -cp /usr/local/zeppelin/interpreter/spark/zeppelin-spark-0.5.5-incubating.jar:/usr/local/spark/conf/:/usr/local/spark/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/spark/lib/datanucleus-core-3.2.10.jar:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar
-Xms1g -Xmx1g -Dfile.encoding=UTF-8 -Xmx1024m -XX:MaxPermSize=512m -Dfile.encoding=UTF-8 -Xmx1024m -XX:MaxPermSize=512m -Dzeppelin.log.file=/usr/local/zeppelin/logs/zeppelin-interpreter-spark-jim-jim.log org.apache.spark.deploy.SparkSubmit --conf spark.driver.extraClassPath=:/usr/local/zeppelin/interpreter/spark/zeppelin-spark-0.5.5-incubating.jar
--conf spark.driver.extraJavaOptions=  -Dfile.encoding=UTF-8 -Xmx1024m -XX:MaxPermSize=512m  -Dfile.encoding=UTF-8 -Xmx1024m -XX:MaxPermSize=512m -Dzeppelin.log.file=/usr/local/zeppelin/logs/zeppelin-interpreter-spark-jim-jim.log
--class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer /usr/local/zeppelin/interpreter/spark/zeppelin-spark-0.5.5-incubating.jar 50309

Hope that helps if that doesn't work please edit your question to show your existing classpath.

0
votes

Zeppelin recently released version 0.6.1 which supports Scala 2.11 and Spark 2.0. I too was puzzled by this error message, since I could clearly see my Spark home directory in the classpath. The new version of Zeppelin works great; I'm currently running it with Spark 2.0/Scala 2.11.