I am running Spark over Yarn on a 4 Node Cluster. The configuration of each machine in the node is 128GB Memory, 24 Core CPU per node. I run Spark on using this command
spark-shell --master yarn --num-executors 19 --executor-memory 18g --executor-cores 4 --driver-memory 4g
But Spark only launches 16 executors maximum. I have maximum-vcore allocation in yarn set to 80 (out of the 94 cores i have). So i was under the impression that this will launch 19 executors but it can only go upto 16 executors. Also I don't think even these executors are using the allocated VCores completely.
These are my questions
- Why isn't spark creating 19 executors. Is there a computation behind the scenes that's limiting it?
- What is the optimal configuration to run spark-shell given my cluster configuration, if I wanted to get the best possible spark performance
- driver-core is set to 1 by default. Will increasing it improve performance.
Here is my Yarn Config
- yarn.nodemanager.resource.memory-mb: 106496
- yarn..minimum-allocation-mb: 3584
- yarn..maximum-allocation-mb: 106496
- yarn..minimum-allocation-vcores: 1
- yarn..maximum-allocation-vcores: 20
- yarn.nodemanager.resource.cpu-vcores: 20