Values of Spark executor, driver, executor cores, executor memory

Question

I have some doubts regarding the values of Spark executor, driver, executor cores, executor memory.

If there are no applications running on a cluster,if you are submitting a job what is the default values of Spark executor, executor core, executor memory it will be taken ?
If we want to calculate the values of Spark executor, executor core, executor memory that are needed for a job that you want to submit, how will you do that ?

kalpesh kalpesh · Accepted Answer · 2018-06-27T07:02:05

Avishek's answer covered about default values. I will put light on calculating optimum values. Lets take an example,

Example : 6 nodes, each with 16 cores and 64Gb RAM

Each executor is JVM instance. So multiple executors can be executed on node.

Lets start with choosing number of cores per executor:

Number of cores = Concurrent tasks as executor can run 

One may think if there is higher concurrency, performance will be better. However, experiments have shown that spark jobs perform well when the number of cores = 5.

If number of cores > 5, it leads to poor performance.

Note that 1 core and 1 Gb is needed for OS and Hadoop Daemons.

Now, calculate number of executors:

As discussed earlier, there are 15 cores available for each node and we are planning for 5 cores per executors.

Thus number of executors per node = 15/5 = 3
Total number of executors = 3*6 = 18

Out of all executors, 1 executor is needed for AM management by YARN.
Thus, final executors count = 18-1 = 17 executors.

Memory Per Executor:

Executor per node = 3
RAM available per node = 63 Gb (as 1Gb is needed for OS and Hadoop Daemon)
Memory per executor = 63/3 = 21 Gb.

Some memory overhead is required by spark. Which is max(384, 7% of memory per executor).
Thus, 7% of 21 = 1.47
As 1.47Gb > 384Mb, subtract 1.47 from 21.
Hence, 21 - 1.47 ~ 19 Gb

Final numbers:

Executors - 17, Cores 5, Executor Memory - 19 GB

Note:

1. Sometimes one may feel to allocate lesser memory than 19 Gb. As memory decreases, the number of executors will increase and the number of cores will decrease. As discussed earlier, number of cores = 5 is best value. However, if you reduce it will still give good results. Just dont exceed value beyond 5.

2. Memory per executor should be less than 40 else there will be a considerable GC overhead.

Values of Spark executor, driver, executor cores, executor memory

2 Answers