1
votes

Im so confusing on which approach i should used to configure the spark application parameters.

Let consider the following cluster configuration : 10 Nodes, 16 cores per Node and 64GB RAM per Node (example from https://spoddutur.github.io/spark-notes/distribution_of_executors_cores_and_memory_for_spark_application.html0

  1. Based on the recommendations mentioned above, Let’s assign 5 core per executors => --executor-cores = 5 (for good HDFS throughput)
  2. Leave 1 core per node for Hadoop/Yarn daemons => Num cores available per node = 16-1 = 15. So, Total available of cores in cluster = 15 x 10 = 150
  3. Number of available executors = (total cores/num-cores-per-executor) = 150/5 = 30
  4. Leaving 1 executor for ApplicationManager => --num-executors = 29
  5. Number of executors per node = 30/10 = 3
  6. Memory per executor = 64GB/3 = 21GB
  7. Counting off heap overhead = 7% of 21GB = 3GB. So, actual --executor-memory = 21 - 3 = 18GB

As a result the recommended configuration will be: 29 executors, 18GB memory each and 5 cores each

Consider another example where the cluster confirguration is: 6 nodes, 25 cores nodes and 125 GB memory per node (example from https://researchcomputing.princeton.edu/computational-hardware/hadoop/spark-memory)

the recommended configuration is; --num-executors 30 --executor-cores 4 --executor-memory 24G.

Question: if we use the rules applied in example 1 and compute the recommended configuration for example 2 the result will be very different. In fact, if we --executor-cores = 4, and then we substract 1 core from each node 6 *24=144 thus we get 144/4= 26 executors. after leaving out 1 core for AM ==> --num-executors = 25. Now if we want to compute the number of executors per node = 25/6 = WHATTTT? Where is my error?

1

1 Answers

1
votes

with 6 nodes, 25 cores nodes and 125 GB , if the break down is '4' core per executor leaving '1' core per node. then 6 * 24 = 144 (total cores considered) . 144/4 = "36" executor not "26". if you leaving out "1" executor out then it would be "35". so for each node (Node1 to Node5 = "6" executor + Node6 = "5" executor + 1 (we left it out) or anyother node carries 5 executor remaining with 6) like that..