23
votes

Having read this question, I would like to ask additional questions:

  1. The Cluster Manager is a long-running service, on which node it is running?
  2. Is it possible that the Master and the Driver nodes will be the same machine? I presume that there should be a rule somewhere stating that these two nodes should be different?
  3. In case where the Driver node fails, who is responsible of re-launching the application? and what will happen exactly? i.e. how the Master node, Cluster Manager and Workers nodes will get involved (if they do), and in which order?
  4. Similarly to the previous question: In case where the Master node fails, what will happen exactly and who is responsible of recovering from the failure?
2

2 Answers

28
votes

1. The Cluster Manager is a long-running service, on which node it is running?

Cluster Manager is Master process in Spark standalone mode. It can be started anywhere by doing ./sbin/start-master.sh, in YARN it would be Resource Manager.

2. Is it possible that the Master and the Driver nodes will be the same machine? I presume that there should be a rule somewhere stating that these two nodes should be different?

Master is per cluster, and Driver is per application. For standalone/yarn clusters, Spark currently supports two deploy modes.

  1. In client mode, the driver is launched in the same process as the client that submits the application.
  2. In cluster mode, however, for standalone, the driver is launched from one of the Worker & for yarn, it is launched inside application master node and the client process exits as soon as it fulfils its responsibility of submitting the application without waiting for the app to finish.

If an application submitted with --deploy-mode client in Master node, both Master and Driver will be on the same node. check deployment of Spark application over YARN

3. In the case where the Driver node fails, who is responsible for re-launching the application? And what will happen exactly? i.e. how the Master node, Cluster Manager and Workers nodes will get involved (if they do), and in which order?

If the driver fails, all executors tasks will be killed for that submitted/triggered spark application.

4. In the case where the Master node fails, what will happen exactly and who is responsible for recovering from the failure?

Master node failures are handled in two ways.

  1. Standby Masters with ZooKeeper:

    Utilizing ZooKeeper to provide leader election and some state storage, you can launch multiple Masters in your cluster connected to the same ZooKeeper instance. One will be elected “leader” and the others will remain in standby mode. If the current leader dies, another Master will be elected, recover the old Master’s state, and then resume scheduling. The entire recovery process (from the time the first leader goes down) should take between 1 and 2 minutes. Note that this delay only affects scheduling new applications – applications that were already running during Master failover are unaffected. check here for configurations

  2. Single-Node Recovery with Local File System:

    ZooKeeper is the best way to go for production-level high availability, but if you want to be able to restart the Master if it goes down, FILESYSTEM mode can take care of it. When applications and Workers register, they have enough state written to the provided directory so that they can be recovered upon a restart of the Master process. check here for conf and more details

6
votes

The Cluster Manager is a long-running service, on which node it is running?

A cluster manager is just a manager of resources, i.e. CPUs and RAM, that SchedulerBackends use to launch tasks. A cluster manager does nothing more to Apache Spark, but offering resources, and once Spark executors launch, they directly communicate with the driver to run tasks.

You can start a standalone master server by executing:

./sbin/start-master.sh

Can be started anywhere.

To run an application on the Spark cluster

./bin/spark-shell --master spark://IP:PORT

Is it possible that the Master and the Driver nodes will be the same machine? I presume that there should be a rule somewhere stating that these two nodes should be different?

In standalone mode, when you start your machine certain JVM will start.Your SparK Master will start up and on each machine Worker JVM will start and they will register with the Spark Master. Both are the resource manager.When you start your application or submit your application in cluster mode a Driver will start up wherever you do ssh to start that application. Driver JVM will contact to the SparK Master for executors(Ex) and in standalone mode Worker will start the Ex. So Spark Master is per cluster and Driver JVM is per application.

In case where the Driver node fails, who is responsible of re-launching the application? and what will happen exactly? i.e. how the Master node, Cluster Manager and Workers nodes will get involved (if they do), and in which order?

If a Ex JVM will crashes the Worker JVM will start the Ex and when Worker JVM ill crashes Spark Master will start them. And with a Spark standalone cluster with cluster deploy mode, you can also specify --supervise to make sure that the driver is automatically restarted if it fails with non-zero exit code.Spark Master will start Driver JVM

Similarly to the previous question: In case where the Master node fails, what will happen exactly and who is responsible of recovering from the failure?

failing on master will result in executors not able to communicate with it. So, they will stop working. Failing of master will make driver unable to communicate with it for job status. So, your application will fail. Master loss will be acknowledged by the running applications but otherwise these should continue to work more or less like nothing happened with two important exceptions:

1.application won't be able to finish in elegant way.

2.if Spark Master is down Worker will try to reregisterWithMaster. If this fails multiple times workers will simply give up.

reregisterWithMaster()-- Re-register with the active master this worker has been communicating with. If there is none, then it means this worker is still bootstrapping and hasn't established a connection with a master yet, in which case we should re-register with all masters. It is important to re-register only with the active master during failures.worker unconditionally attempts to re-register with all masters, will may arise race condition.Error detailed in SPARK-4592:

At this moment long running applications won't be able to continue processing but it still shouldn't result in immediate failure. Instead application will wait for a master to go back on-line (file system recovery) or a contact from a new leader (Zookeeper mode), and if that happens it will continue processing.