1
votes

We are trying to setup Spark HA setup with ZK. We have 2 machines for Master for Spark process and another 3 for Spark Slaves The configuration In Master Machine for spark HA done as below in spark-env.sh :

# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=DEV-SMP-Manager01:2181DEV-SMP-Worker01:2181,DEV-SMP-Worker05:2181"

where DEV-SMP-Manager01:2181DEV-SMP-Worker01:2181,DEV-SMP-Worker05:2181 are ZK Quorum. and they are up and running . Added the curator jars in spark config file mentioned below. When we start the master(s) using command sbin/start-master.sh ,

both of them are coming as "STANDBY" , no error in spark logs . here we are stuck , any idea what goes wrong my spark-env is as below :-

export SPARK_DIST_CLASSPATH=$(/home/hduser/smp/hadoop-2.5.1/bin/hadoop classpath) export SPARK_DIST_CLASSPATH=$SPARK_DIST_CLASSPATH:/home/hduser/smp/spark-1.6.1-bin-without-hadoop/curator-client-2.0.0-incubating.jar:/home/hduser/smp/spark-1.6.1-bin-without-hadoop/curator-framework-2.2.0-incubating.jar

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=DEV-SMP-Manager01:2181DEV-SMP-Worker01:2181,DEV-SMP-Worker05:2181" when i see machine:8080 for both web UI , we see the status = STANDBY for the master and also no workers are displayed in any of them . Though all the workers are up and running . Any clue will be helpful. Ideally one master should be live displaying all the workers but here both master are in stand by and no workers are displayed for any master ?

Version used :- Spark - spark-1.6.1 ZK - 3.4.6

1
Is the same configuration copied to both the nodes where you are running master? Also, have you tried first running only the masters on both nodes and then start the workers on active master node. Please attach the snapshots of both the master UI.Rakesh Rakshit
Also check if these nodes are accessible from both the masters - DEV-SMP-Manager01:2181DEV-SMP-Worker01:2181,DEV-SMP-Worker05:2181Rakesh Rakshit

1 Answers

0
votes

After some time of continuous work , we have seen couple of changes resolved the complete issue and now both masters are working fine one will be Alive and other should be Standby and after failover standby should be Alive and all the worker comes under that.

Changes in spark-env.sh file

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=DEV-SMP-Manager01:2181,DEV-SMP-Worker01:2181,DEV-SMP-Worker05:2181 -Dspark.deploy.zookeeper.dir=/sparkha"

where Dspark.deploy.zookeeper.dir store for spark HA in ZK data store , by default it will be /spark in ZK , but we wanted to be configurable.

And second thing is the jars for curator required by ZK Leader election for Spark. So we added e env veritable and that we appended in SPARK_DIST_CLASSPATH as explicit path to 'hadoop' binary which was already there , we just added our env newly created.

Then Quorum has to be properly configured without any "" like above url=a1:2181,a2:2181

after that started one by one master and then slaves by start-slaves.sh everything falls in place then . Thanks all who has looked into this issue .Hope this will some others . Now we are in good position to help others in the HA setup.