Dear community! Before I describe the problem, here's a short description of the software in use (where the latter two are running in a small cluster of three nodes, each of them using Ubuntu 14.04):
- Zeppelin 0.6.1
- Spark 2.0.0 along with Scala 2.11.8
- Hadoop 2.7.3
The situation is as follows: In order to use the TwitterUtils class in a Spark Streaming application written in a Zeppelin note, I need to include org.apache.spark.streaming.twitter._ from Maven (org.apache.bahir:spark-streaming-twitter_2.11:2.0.0-preview). What I learned so far is that there are a couple of options to make external dependencies available in Zeppelin:
- Export the SPARK_SUBMIT_OPTIONS variable in conf/zeppelin-env.sh and set --jars (in my case --jars hdfs://admdsmaster:54310/global/jars/spark-streaming-twitter_2.11-2.0.0-preview.jar (path pointing to local file system was tested as well)).
- Export SPARK_SUBMIT_OPTIONS and set --packages (in my case --packages org.apache.bahir:spark-streaming-twitter_2.11:2.0.0-preview).
- Set spark.jars or spark.jars.packages in conf/spark-defaults.conf with the values mentioned above.
- Use the %dep interpreter in Zeppelin itself like so: z.load("org.apache.bahir:spark-streaming-twitter_2.11:2.0.0-preview"). This is deprecated, though.
- Use sc.addJar() in the Zeppelin note to manually add a .jar file.
After having tried all of the above -- and almost arbitrary combinations and variations thereof -- the problem is that I still can't import the TwitterUtils class from within a Zeppelin note:
Class import failing in Zeppelin note.
What can be seen from the picture as well is the output of sc.listJars() which shows that the .jar file was actually included. Nonetheless, the class import fails.
My first thought was that the problem occurs because Spark is running in yarn-client mode, so I started the Spark shell in yarn-client mode as well and tried to import the TwitterUtils class from there -- which worked:
Class import working from Spark shell.
In order to find out what's going on, I searched the log files of Zeppelin, Spark and YARN, but couldn't find any error messages to point me to the cause of the problem.
Long story short: Although the jar file was included in Zeppelin (as proven by sc.listJars()) and although the class import works from the spark-shell in yarn-client mode, I just can't get the import to work from within my Zeppelin note.
Long story even shorter: I'd really appreciate your ideas on how to solve this problem!
Thanks in advance for your time and effort.
P.S.: I'm sorry for the fact that I could not upload the images to this post directly -- it says that I need at least 10 reputation points which I do not have as this is my first ever post here.
