3
votes

I understand the major differences between client and cluster mode for Spark applications on YARN.

Major differences include

  1. Where do the driver run - Local in clinet mode, Application Master in cluster mode
  2. Client running duration - In clinet mode, client needs to run for entire duration, In cluster mode, client need not run as AM takes care of it
  3. Interactive usage - spark shell and pyspark. Cluster mode is not suited well as these require the driver to be run on client
  4. Scheduling work - In client mode, the client schedules the work by communicating directly with the containers. In cluster mode, A schedules the work by communicating directly with the containers

In both cases for similarities

  1. Who handles the executor requests from the YARN - Application master
  2. Who starts the executor processes - YARN Node Manager

My question is - In real world scenarios( production environment), where we do not need interactive mode, client not requiring to run for long duration - is the cluster mode an obvious choice?

Are there any benefits for client mode like:

  • to run the driver on client machine rather than AM
  • to allow client to schedule work, rather than AM
2

2 Answers

2
votes

From the documentation,

A common deployment strategy is to submit your application from a gateway machine that is physically co-located with your worker machines (e.g. Master node in a standalone EC2 cluster). In this setup, client mode is appropriate. In client mode, the driver is launched directly within the client spark-submit process, with the input and output of the application attached to the console. Thus, this mode is especially suitable for applications that involve the REPL (e.g. Spark shell).

Alternatively, if your application is submitted from a machine far from the worker machines (e.g. locally on your laptop), it is common to use cluster mode to minimize network latency between the drivers and the executors. Note that cluster mode is currently not supported for standalone clusters, Mesos clusters, or python applications.

Looks like, the main reason is when we run the spark-submit from remote, to reduce the latency between executors and driver, cluster mode is preferred.

1
votes

From my experience, in production environment the only resonable mode is cluster-mode with 2 exceptions:

  • when hadoop nodes does not have resources needed by application, for example: at the end of execution spark job performs ssh to server that is not accessible from hadoop nodes
  • when you use spark streaming and you want to shut it gracefully (killing cluster-mode application forces the streaming to close and if you run in client-mode you can call ssc.stop(stopGracefully = true)