I have this long forgotten question about spark executor memory. I gave these parameters in my code for a spark job.
Case 1:
object Pickup {
val conf = new SparkConf().setAppName("SPLINTER").set("spark.executor.heartbeatInterval", "120s")
.set("spark.network.timeout", "12000s")
.set("spark.sql.orc.filterPushdown", "true")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryoserializer.buffer.max", "512m")
.set("spark.serializer", classOf[org.apache.spark.serializer.KryoSerializer].getName)
.set("spark.streaming.stopGracefullyOnShutdown", "true")
.set("spark.yarn.driver.memoryOverhead", "8192")
.set("spark.yarn.executor.memoryOverhead", "8192")
.set("spark.shuffle.service.enabled", "true")
.set("spark.sql.tungsten.enabled", "true")
.set("spark.executor.instances", "4")
.set("spark.executor.memory", "2g")
.set("spark.executor.cores", "5")
.set("spark.files.maxPartitionBytes", "268435468")
.set("spark.sql.shuffle.partitions","20")
...
...
...
}
The executor & driver overhead memory are given 8GB. Below is how the job looks on yarn:
This is how number of executors running and their memory looks like
Why is the allocated memory 53248MB(52GB) ? Does it add up with the overhead memory values as well ? Even if it is like => 4 executors * 2gb per executor => 8gb + (8gb driver overhead + 8gb executor overhead) it is still 24gb.
So I changed the memory parameters in the job again as below:
Case 2: This time I gave driver & executor overhead memories as 2gb keeping the rest same.
Executors in the background:
If all the memory numbers are added, it still becomes: 4 executors * 2gb per executor => 8gb + (2gb driver overhead + 2gb executor overhead) it is still 12GB which is less than 20gb, the allocated memory parameter shown in the job card.
Spark submit command:
SPARK_MAJOR_VERSION=2 spark-submit --class com.partition.source.Pickup --master=yarn --conf spark.ui.port=4090 --driver-class-path /home/username/jars/greenplum.jar:/home/username/jars/postgresql-42.1.4.jar:/home/username/ReconTest/inputdir/myjars/hive-jdbc-2.3.5.jar --conf spark.jars=/home/username/jars/greenplum.jar,/home/username/jars/postgresql-42.1.4.jar,/home/username/ReconTest/inputdir/myjars/hive-jdbc-2.3.5.jar --executor-cores 4 --executor-memory 2G --keytab /home/username/username.keytab --principal [email protected] --files /$SPARK_HOME/conf/hive-site.xml,testconnection.properties --name Splinter --conf spark.executor.extraClassPath=/home/username/jars/greenplum.jar splinter_2.11-0.1.jar SSS
I searched online to see how the executor memory is distributed across the job. Most of the information is about how to tune it rather than explaining how is it distributed. The most confusing parts are the numbers shown in the allocated memory parameter from the job cards marked in red box. I don't understand how 53248MB(52GB) is allocated for job with 4 executors 2gb each.
Am I missing any link ? Could anyone let me know why is that darn number so big ? How is the executor memory distributed in the background ?



