2
votes

I have a datastax cluster 4.8 (Cassandra + Spark) with authentication activated. I would like to be able to use the notebook Zeppelin on my cluster with the Spark master and my database Cassandra.

I donwload the Binary package 0.5.6 of Zeppelin. I put it on my server. If I start it (./bin/zeppelin-daemon.sh start) with default conf it's work fine http://ServerName:8080/#/.

But when I want to use my DSE spark master, the result output is:

java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:344) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.thrift.transport.TSocket.open(TSocket.java:182) at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:51) at org.apache.zeppelin.interpreter.remote.ClientFactory.create(ClientFactory.java:37) at org.apache.commons.pool2.BasePooledObjectFactory.makeObject(BasePooledObjectFactory.java:60) at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:861) at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435) at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.getClient(RemoteInterpreterProcess.java:139) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.init(RemoteInterpreter.java:129) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:257) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:104) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:198) at org.apache.zeppelin.scheduler.Job.run(Job.java:169) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:322) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

It's seems like the notebook can't connect to the spark-master that I usually call with

$> dse -u "username" -p "password" spark

I'm not sure that's the problem but I can't figure out where to set those parameters xD.

For information, I set /zeppelin-0.5.6-incubating-bin-all/conf/zeppelin-env.sh with :

  • export MASTER=spark://ip_of_my_server:7077

  • export ZEPPELIN_MEM=-Xmx5g as it's suggest in the pull request ZEPPELIN-305 mentioned in Hello world in zeppelin failed (but i don't think it's the problem, since this request is closed in 0.5.6)

  • export SPARK_HOME=/usr/share/dse/spark containing :

bin
data
lib
python
RELEASE
sbin
spark-jobserver

I although put "spark://ip_of_my_server:7077 " in "master" field on the interpreter interface.

So have you any idea for solve my problem and connect DSE spark and zeppelin :)?

1
I find in the zeppelin-root-labgsd2t.out : Failed to find Spark assembly in /usr/share/dse/spark/lib. You need to build Spark before running this program. So do you know where is the Spark assembly of DSE ?Nongi

1 Answers

2
votes

After some echanges with the Datastax expert Duy Hai Doan

I get a solution, I advise you to go on his blog http://www.doanduyhai.com/blog/?p=2325

And for the authentification details, go to interpreter parameter and add :

  • For Cassandra

cassandra.hosts : "YourNodeIP"

cassandra.credentials.username : "YourUserName"

cassandra.credentials.password : "YourPassword"


  • For Spark

spark.cassandra.auth.password : "YourPassword"

spark.cassandra.auth.username : "YourUserName"

spark.cassandra.connection.host : "YourSparkMasterIP"

Big thank to Datastax and Duy