I understand the major differences between client and cluster mode for Spark applications on YARN.
Major differences include
- Where do the driver run - Local in clinet mode, Application Master in cluster mode
- Client running duration - In clinet mode, client needs to run for entire duration, In cluster mode, client need not run as AM takes care of it
- Interactive usage - spark shell and pyspark. Cluster mode is not suited well as these require the driver to be run on client
- Scheduling work - In client mode, the client schedules the work by communicating directly with the containers. In cluster mode, A schedules the work by communicating directly with the containers
In both cases for similarities
- Who handles the executor requests from the YARN - Application master
- Who starts the executor processes - YARN Node Manager
My question is - In real world scenarios( production environment), where we do not need interactive mode, client not requiring to run for long duration - is the cluster mode an obvious choice?
Are there any benefits for client mode like:
- to run the driver on client machine rather than AM
- to allow client to schedule work, rather than AM