7
votes

Hi I have a Spark cluster in standalone mode, i.e., I have one Spark-master process and three Spark-slave processes running in my laptop (Spark cluster in the same one machine).

Starting master and slaves is just to run the scripts in Spark_Folder/sbin/start-master.sh, and Spark_Folder/sbin/stop-master.sh.

However, when I run the Spark_Folder/sbin/stop-all.sh, it is only stopping one master and one salves, since I have three slaves running, after running stop-all.sh I still have two slaves running.

I dig into the script "stop-slaves.sh" and found below:

if [ "$SPARK_WORKER_INSTANCES" = "" ]; then
  "$sbin"/spark-daemons.sh stop org.apache.spark.deploy.worker.Worker 1
else
  for ((i=0; i<$SPARK_WORKER_INSTANCES; i++)); do
    "$sbin"/spark-daemons.sh stop org.apache.spark.deploy.worker.Worker $(( $i + 1 ))
  done
fi

It seems that this script is stopping based on "SPARK_WORKER_INSTANCES" number. But what if I start a slave using a non-number name?

And any idea to shut down the whole spark cluster by one click? (I know to run "pkill -f spark*" will work though)

Thanks a lot.

4

4 Answers

6
votes

I just figure out the solution:

in "/usr/lib/spark/conf/spark-env.sh", add an extra parameter "SPARK_WORKER_INSTANCES=3" (or the number of your slave instances), then run "/usr/lib/spark/sbin/stop-all.sh" and all instances stopped.

However, "stop-all.sh" works only for slaves you started using numbers, eg:

/usr/lib/spark/sbin/start-slave.sh 1 spark://master-address:7077
/usr/lib/spark/sbin/start-slave.sh 2 spark://master-address:7077
/usr/lib/spark/sbin/start-slave.sh 3 spark://master-address:7077

if you start slaves using arbitrary names then "stop-all.sh" is not working, eg:

/usr/lib/spark/sbin/start-slave.sh myWorer1 spark://master-address:7077
/usr/lib/spark/sbin/start-slave.sh myWorer2 spark://master-address:7077
/usr/lib/spark/sbin/start-slave.sh myWorer3 spark://master-address:7077
4
votes

Use jps command in terminal

output would be like this

5417 NameNode
8480 Jps
13311 Elasticsearch
5602 DataNode
5134 Worker
5849 SecondaryNameNode
4905 Master

Kill the process of master and worker.

like this

kill 5134
kill 4905

Master and slaves both will be stopped.

If these are restarted again this means you have shut down your system with stopping master and slaves...you need to reboot your system.

1
votes

kill -9 $(jps -l | grep spark | awk -F ' ' '{print $1}')

0
votes

I had a similar issue. In the I just had to ssh to the 8 machines, and use kill -9 on all the relevant processes. I used ps -ef | grep spark to find the process ids. Tedious, but it worked.