Spark on YARN: execute driver without worker

Question

Running Spark on YARN, cluster mode.

3 data nodes with YARN
YARN => 32 vCores, 32 GB RAM

I am submitting Spark program like this:

spark-submit \
    --class com.blablacar.insights.etl.SparkETL \
    --name ${JOB_NAME} \
    --master yarn \
    --num-executors 1 \
    --deploy-mode cluster \
    --driver-memory 512m \
    --driver-cores 1 \
    --executor-memory 2g \
    --executor-cores 20 \
    toto.jar json

I can see 2 jobs are running fine on 2 nodes. But I can see also 2 other job with just a driver container !

Is it possible to not run driver if there no resource for worker?

Thomas Decaux Thomas Decaux · Accepted Answer · 2017-06-08T18:02:01

Actually, there is a setting to limit resources to "Application Master" (in case of Spark, this is the driver):

yarn.scheduler.capacity.maximum-am-resource-percent

From http://maprdocs.mapr.com/home/AdministratorGuide/Hadoop2.xCapacityScheduler-RunningPendingApps.html:

Maximum percent of resources in the cluster that can be used to run application masters - controls the number of concurrent active applications.

This way, YARN will not take full resources for Spark drivers, and keep resources for workers. Youpi !

Spark on YARN: execute driver without worker

1 Answers