1
votes

I'm trying to add Spark dependencies via maven by specifying groupId:artifactId:version in Dependencies section in Zeppelin's Interpreter console. Once I saved and executed a spark paragraph. A java.lang.NullPointerException was thrown, full log below

java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:44)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:39)
at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext_2(OldSparkInterpreter.java:375)
at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext(OldSparkInterpreter.java:364)
at org.apache.zeppelin.spark.OldSparkInterpreter.getSparkContext(OldSparkInterpreter.java:172)
at org.apache.zeppelin.spark.OldSparkInterpreter.open(OldSparkInterpreter.java:740)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:61)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.spark.SparkSqlInterpreter.getSparkInterpreter(SparkSqlInterpreter.java:76)
at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:92)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:633)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

I then removed that maven dependencies but the exception didn't disappear.

According to the log, it seems like Zeppelin is unable to start a spark interpreter, so I looked at Spark Interpreter log at zeppelin-interpreter-spark-zeppelin-development-cluster-m.log but nothing was logged when I ran a spark paragraph. Below is a Zeppelin log found in zeppelin-zeppelin-development-cluster-m.log

INFO [2018-08-10 13:24:53,036] ({qtp2110245805-15} VFSNotebookRepo.java[save]:196) - Saving note:2DM9MXZGM
INFO [2018-08-10 13:24:53,053] ({pool-2-thread-2} SchedulerFactory.java[jobStarted]:109) - Job 20180810-111509_373986425 started by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:shared_process-2DM9MXZGM
INFO [2018-08-10 13:24:53,054] ({pool-2-thread-2} Paragraph.java[jobRun]:380) - Run paragraph [paragraph_id: 20180810-111509_373986425, interpreter: sql, note_id: 2DM9MXZGM, user: anonymous]
WARN [2018-08-10 13:24:58,614] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2303) - Job 20180810-111509_373986425 is finished, status: ERROR, exception: null, result: %text java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:44)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:39)
at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext_2(OldSparkInterpreter.java:375)
at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext(OldSparkInterpreter.java:364)
at org.apache.zeppelin.spark.OldSparkInterpreter.getSparkContext(OldSparkInterpreter.java:172)
at org.apache.zeppelin.spark.OldSparkInterpreter.open(OldSparkInterpreter.java:740)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:61)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at org.apache.zeppelin.spark.SparkSqlInterpreter.getSparkInterpreter(SparkSqlInterpreter.java:76)
at org.apache.zeppelin.spark.SparkSqlInterpreter.interpret(SparkSqlInterpreter.java:92)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:103)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:633)
at org.apache.zeppelin.scheduler.Job.run(Job.java:188)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

I found similar posts facing the same issues but they are related to zeppelin unable to connect to Hive. I don't think that's my issue. I also ran spark-shell and it worked fine.

I'm on Google Dataproc image version 1.3.1.

thanks

1
Usually this is due to failed to create SparkContext, please check the logs for other useful stacktrace.zjffdu

1 Answers

1
votes

I've managed to fix it. The problem is I added the following artifact https://mvnrepository.com/artifact/spotify/spark-bigquery/0.2.2-s_2.11 as an external dependency and it might have conflicted with Spark or Zeppelin libs. After removing everything in zeppelin.dep.localrepo which is /usr/lib/zeppelin/local-repo in my case and restart Zeppelin, everything is back to normal.

thanks