Spark how many JVMs are run on worker with multiple applications

Question

I'm using Spark + Standalone cluster manager. I have 5 worker nodes, each worker node has 2 cores and 14 GB of RAM.

How can I figure out how much JVMs Spark will start on worker nodes?

Use case 1

I start application/session with configs

spark.executor.cores=2 spark.executor.memory=10GB

At this moment Spark starts one executor's JVM on each worker node, right?
Then I start another Spark application/session before the first session is in progress with configs

spark.executor.cores=2 spark.executor.memory=4GB

At this moment there are two JVMs on each worker node, right?

Use case 2

I start application/session with configs:

sstsp.spark.shuffle.service.enabled=true
sstsp.spark.dynamicAllocation.enabled=true
sstsp.spark.dynamicAllocation.maxExecutors=35
sstsp.spark.executor.cores=2
sstsp.spark.executor.memory=2GB

That means each worker node will launch 7 executors (JVMs) each 2 GB RAM, right?

P.S.

How big is an overhead of JVM? I mean how much RAM won't be used for computation purposes in Use Case 2 where node's RAM is divided among 7 JVMs?

FaigB FaigB · Accepted Answer · 2017-05-17T11:31:49

As you mentioned every worker node has 2 cores so it means by setting executor cores=2 there will be only 1 executor per worker node.

Use case 1

1) 5 worker nodes = 5 executors

2) 5 worker nodes = 5 executors (based on availability)

Use case 2

as far as you have executor core=2 maximum executors would be 5, 1 executor per worker node.

To launch multiple executors on a machine you start multiple standalone workers, each with its own JVM. It introduces unnecessary overhead due to these JVM processes, provided that there are enough cores on that worker.

If you are running Spark in standalone mode on memory-rich nodes it can be beneficial to have multiple worker instances on the same node as a very large heap size has two disadvantages:

Garbage collector pauses can hurt throughput of Spark jobs.
Heap size of >32 GB can’t use CompressedOoops.

Mesos and YARN can, out of the box, support packing multiple, smaller executors onto the same physical host, so requesting smaller executors doesn’t mean your application will have fewer overall resources.

Spark how many JVMs are run on worker with multiple applications

Use case 1

Use case 2

P.S.

1 Answers