0
votes

I have some doubts regarding the values of Spark executor, driver, executor cores, executor memory.

  1. If there are no applications running on a cluster,if you are submitting a job what is the default values of Spark executor, executor core, executor memory it will be taken ?
  2. If we want to calculate the values of Spark executor, executor core, executor memory that are needed for a job that you want to submit, how will you do that ?
2

2 Answers

1
votes

Avishek's answer covered about default values. I will put light on calculating optimum values. Lets take an example,

Example : 6 nodes, each with 16 cores and 64Gb RAM

Each executor is JVM instance. So multiple executors can be executed on node.

Lets start with choosing number of cores per executor:

Number of cores = Concurrent tasks as executor can run 

One may think if there is higher concurrency, performance will be better. However, experiments have shown that spark jobs perform well when the number of cores = 5.

If number of cores > 5, it leads to poor performance.

Note that 1 core and 1 Gb is needed for OS and Hadoop Daemons.

Now, calculate number of executors:

As discussed earlier, there are 15 cores available for each node and we are planning for 5 cores per executors.

Thus number of executors per node = 15/5 = 3
Total number of executors = 3*6 = 18

Out of all executors, 1 executor is needed for AM management by YARN.
Thus, final executors count = 18-1 = 17 executors.

Memory Per Executor:

Executor per node = 3
RAM available per node = 63 Gb (as 1Gb is needed for OS and Hadoop Daemon)
Memory per executor = 63/3 = 21 Gb.

Some memory overhead is required by spark. Which is max(384, 7% of memory per executor).
Thus, 7% of 21 = 1.47
As 1.47Gb > 384Mb, subtract 1.47 from 21.
Hence, 21 - 1.47 ~ 19 Gb

Final numbers:

Executors - 17, Cores 5, Executor Memory - 19 GB

Note:

1. Sometimes one may feel to allocate lesser memory than 19 Gb. As memory decreases, the number of executors will increase and the number of cores will decrease. As discussed earlier, number of cores = 5 is best value. However, if you reduce it will still give good results. Just dont exceed value beyond 5.

2. Memory per executor should be less than 40 else there will be a considerable GC overhead.
0
votes

If there are no applications running on a cluster,if you are submitting a job what is the default values of Spark executor, executor core, executor memory it will be taken ?

Default values are stored in spark-defaults.conf in the cluster where spark is installed. So you can verify the values. Generally the default values are.

To check the default values. Please refer to this document

If we want to calculate the values of Spark executor, executor core, executor memory that are needed for a job that you want to submit, how will you do that ?

Depends on following things

  1. what type of job you have, i.e it is shuffle intensive or only map operation. If it is shuffle you probably need more memory.

  2. Data size, bigger the data size bigger the memory usage

  3. Cluster constraints. How much memory can you afford.

Based on these factors you need to start with some numbers and then looking at the spark UI you need to understand the bottleneck and increase or decrease the memory footprint.

One caution keeping executor memory more than 40G could be coulter productive because JVM GC becomes slower. Also having too many cores might slowdown the process.