17
votes

I have installed Zeppelin 0.7.1. When I tried to execute the Example spark program(which was available with Zeppelin Tutorial notebook), I am getting the following error

java.lang.NullPointerException
    at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
    at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
    at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:391)
    at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:380)
    at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
    at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:828)
    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:483)
    at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
    at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

I have also setup the config file(zeppelin-env.sh) to point to my Spark installation & Hadoop configuration directory

export SPARK_HOME="/${homedir}/sk"
export HADOOP_CONF_DIR="/${homedir}/hp/etc/hadoop"

The Spark version I am using is 2.1.0 & Hadoop is 2.7.3

Also I am using the default Spark Interpreter Configuration(so Spark is set to run in Local mode)

Am I missing something here?

PS : I am able to connect to spark from the Terminal using spark-shell

9

9 Answers

12
votes

Just now I got solution of this issue for Zeppelin-0.7.2:

Root Cause is : Spark trying to setup Hive context, but hdfs services is not running, that's why HiveContext become null and throwing null pointer exception.

Solution:
1. Setup Saprk Home [optional] and HDFS.
2. Run HDFS service
3. Restart zeppelin server
OR
1. Go to Zeppelin's Interpreter settings.
2. Select Spark Interpreter
3. zeppelin.spark.useHiveContext = false

9
votes

Finally, I am able to find out the reason. When I checked the logs in ZL_HOME/logs directory, find out it seems to be the Spark Driver binding error. Added the following property in Spark Interpreter Binding and works good now...

enter image description here

PS : Looks like this issue comes up mainly if you connect to VPN...and I do connect to VPN

2
votes

Did you set right SPARK_HOME? Just wondered what sk is in your export SPARK_HOME="/${homedir}/sk"

(I just wanted to comment below your question but couldn't, due to my lack of reputationšŸ˜­)

0
votes

solved it by adding this line at the top in file common.sh in dir zeppelin-0.6.1 then bin

open common.sh and add command in the top of file set :

unset CLASSPATH

0
votes
    enterCaused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
        ... 74 more
)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:466)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:236)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
        ... 71 more
 INFO [2017-11-20 17:51:55,288] ({pool-2-thread-4} SparkInterpreter.java[createSparkSession]:369) - Created Spark session with Hive support
ERROR [2017-11-20 17:51:55,290] ({pool-2-thread-4} Job.java[run]:181) - Job failed code here

It looks like Hive Metastore service not started. You can start the Metastore service and try again.

hive --service metastore
0
votes

I was getting the exactly same exception for zepelline 0.7.2 version on window 7. I had to do multiple changes into the configuration to make it work.

First rename the zeppelin-env.cmd.template to zeppelin-env.cmd. Add the env variable for PYTHONPATH. The file can be located at %ZEPPELIN_HOME%/conf folder.

set PYTHONPATH=%SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-0.10.4-src.zip;%SPARK_HOME%\python\lib\pyspark.zip

Open the zeppelin.cmd from location %ZEPPELIN_HOME%/bin to add a %SPARK_HOME% and %ZEPPELIN_HOME%. Those will be the first lines in the instruction. The value for %SPARK_HOME% was configured as blank as I was using the embedded spark library.I added %ZEPPELIN_HOME% to make sure this env is configured at the initial stage of startup.

set SPARK_HOME=
set ZEPPELIN_HOME=<PATH to zeppelin installed folder>

Next we will have to copy all the jar and pySpark from the %spark_home%/ to zeppeline folder.

cp %SPARK_HOME%/jar/*.jar %ZEPPELIN_HOME%/interpreter/spark
cp %SPARK_HOME%/python/pyspark %ZEPPELIN_HOME%/interpreter/spark/pyspark

I wasn't starting the interpreter.cmd while accessing the notebook. This was causing the nullpointer exception. I opened two command prompt and in one cmd I started zeppeline.cmd and in the other interpreter.cmd.

We have to specify two additional input port and path to zeppeline local_repo in command line. You can get the path to local_repo in zeppeline spark interpreter page. Use exactly same path to start the interpreter.cmd.

interpreter.cmd  -d %ZEPPELIN_HOME%\interpreter\spark\ -p 5050  -l %ZEPPELIN_HOME%\local-repo\2D64VMYZE

The host and port needs to be specified in the spark interpreter page in zepelline ui. Select the Connect to external Process

HOST : localhost
PORT : 5050

Once all these on configuration are created, on next step we can save and restart the spark interpreter. Create a new notebook and type sc.version. It will publish the spark version. Zeppeline 0.7.2 doesn't support spark 2.2.1

0
votes

Check if your NameNode have gone in safe mode.

check with below syntax:

sudo -u hdfs hdfs dfsadmin -safemode get

to leave from safe mode use below command:

sudo -u hdfs hdfs dfsadmin -safemode leave
0
votes

On AWS EMR the issue was memory. I had to manually set lower value for spark.executor.memory in the Interpeter for Spark using the UI of Zeppelin.

The value varies based on your instance size. The best is to check the logs located in the /mnt/var/log/zeppelin/ folder.

In my case the underlying error was:

Error initializing SparkContext.
java.lang.IllegalArgumentException: Required executor memory (6144+614 MB) is above the max threshold (6144 MB) of this cluster! Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or 'yarn.nodemanager.resource.memory-mb'.

That helped me understand why it was failing and what I can do to fix it.

Note:

This happened because I was starting an instance with HBase which limits the available memory. See the defaults for instance size here.

-2
votes

Seems to be bug in Zeppelin 0.7.1. Works fine in 0.7.2.