2
votes

I'm using Spark + Standalone cluster manager. I have 5 worker nodes, each worker node has 2 cores and 14 GB of RAM.

How can I figure out how much JVMs Spark will start on worker nodes?

Use case 1

  1. I start application/session with configs

    spark.executor.cores=2 spark.executor.memory=10GB

    At this moment Spark starts one executor's JVM on each worker node, right?

  2. Then I start another Spark application/session before the first session is in progress with configs

    spark.executor.cores=2 spark.executor.memory=4GB

    At this moment there are two JVMs on each worker node, right?

Use case 2

I start application/session with configs:

sstsp.spark.shuffle.service.enabled=true
sstsp.spark.dynamicAllocation.enabled=true
sstsp.spark.dynamicAllocation.maxExecutors=35
sstsp.spark.executor.cores=2
sstsp.spark.executor.memory=2GB

That means each worker node will launch 7 executors (JVMs) each 2 GB RAM, right?

P.S.

How big is an overhead of JVM? I mean how much RAM won't be used for computation purposes in Use Case 2 where node's RAM is divided among 7 JVMs?

1

1 Answers

2
votes

As you mentioned every worker node has 2 cores so it means by setting executor cores=2 there will be only 1 executor per worker node.

Use case 1

1) 5 worker nodes = 5 executors

2) 5 worker nodes = 5 executors (based on availability)

Use case 2

as far as you have executor core=2 maximum executors would be 5, 1 executor per worker node.

To launch multiple executors on a machine you start multiple standalone workers, each with its own JVM. It introduces unnecessary overhead due to these JVM processes, provided that there are enough cores on that worker.

If you are running Spark in standalone mode on memory-rich nodes it can be beneficial to have multiple worker instances on the same node as a very large heap size has two disadvantages:

Mesos and YARN can, out of the box, support packing multiple, smaller executors onto the same physical host, so requesting smaller executors doesn’t mean your application will have fewer overall resources.