Use Apache Zeppelin with existing Spark Cluster

Question

I want to install Zeppelin to use my existing Spark cluster. I used the following way:

Spark Master (Spark 1.5.0 for Hadoop 2.4):
- Zeppelin 0.5.5
Spark Slave

I downladed the Zeppelin v0.5.5 and installed it via:

mvn clean package -Pspark-1.5 -Dspark.version=1.5.0 -Dhadoop.version=2.4.0 -Phadoop-2.4 -DskipTests

I saw, that the local[*] master setting works also without my Spark Cluster (notebook also runnable when shutted down the Spark cluster).

My problem: When I want to use my Spark Cluster for a Streaming application, it seems not to work correctly. My SQL-Table is empty when I use spark://my_server:7077 as master - in local mode everything works fine!

See also my other question which describes the problem: Apache Zeppelin & Spark Streaming: Twitter Example only works local

Did I something wrong

on installation via "mvn clean packge"?
on setting the master url?
Spark and/or Hadoop version (any limitations???)
Do I have to set something special in zeppelin-env.sh file (is actually back on defaults)???

Just as additional information, have you ever tried the IBM's spark-kernel? — Alberto Bonsanto
No I didn't. What is the difference between the "Apache" and the "IBM" version? — D. Müller
Well, I wrote that comment since I think you are using Zeppelin because you want to run Scala notebooks, and the spark-kernel among Jupyter let you do that. — Alberto Bonsanto
Just to be sure - you have a working spark cluster which you can connect to using spark-shell --master spark://my_server:7077 and your code works fine but when you set the zeppelin master property you don't get it to run inside zeppelin? — Eran Witkon
Well, I can run my Java Applications (jars) via the spark-submit script. Also the Zeppelin paragraph of the streaming logic seems always to work (local and on "external" spark cluster). The only problem I have is that the %sql part in a second paragraph. Its table is only filled if I run it in local mode (set master to local[*]), and not on spark://master:7077. — D. Müller

D. Müller D. Müller · Accepted Answer · 2015-12-16T19:29:14

The problem was caused by a missing library dependency! So before searching around too long, first check the dependencies, whether one is missing!

%dep
z.reset
z.load("org.apache.spark:spark-streaming-twitter_2.10:1.5.1")

Use Apache Zeppelin with existing Spark Cluster

1 Answers