4
votes

My cluster configuration is as below :- 7 Nodes each with 32 cores and 252 GB of memory.

The yarn configurations are as below :-

yarn.scheduler.maximum-allocation-mb - 10GB
yarn.scheduler.minimum-allocation-mb - 2GB
yarn.nodemanager.vmem-pmem-ratio - 2.1
yarn.nodemanager.resource.memory-mb - 22GB
yarn.scheduler.maximum-allocation-vcores - 25
yarn.scheduler.minimum-allocation-vcores - 1
yarn.nodemanager.resource.cpu-vcores - 25

The map reduce configurations are as below :-

mapreduce.map.java.opts - -Xmx1638m
mapreduce.map.memory.mb - 2GB
mapreduce.reduce.java.opts - -Xmx3276m
mapreduce.reduce.memory.mb - 4Gb

The spark configurations are as :-

spark.yarn.driver.memoryOverhead 384
spark.yarn.executor.memoryOverhead 384

Now I tried running the spark-shell by setting values as master yarn and different values for executor-memory, num-executors, executor-cores.

  1. spark-shell --master yarn --executor-memory 9856M --num-executors 175 --executor-cores 1

In this case the executor memory + 384 cannot exceed 10GB max for a yarn scheduler. So in this case 9856M + 384 MB = 10GB so it works fine. Now once the spark shell is up, the total number of executors were 124 instead of requtesed 175. The Storage memory as seen in spark shell start logs or Spark UI for each executor is 6.7 GB(i.e. 67% of 10GB).

The top command output for the spark shell process is as below:-

PID     USER      PR    NI  VIRT  RES   SHR S  %CPU %MEM  TIME+  
8478    hdp66-ss  20    0   13.5g 1.1g  25m S  1.9  0.4   2:11.28

So virtual memory is 13.5G and physical memory is 1.1g

  1. spark-shell --master yarn --executor-memory 9856M --num-executors 35 --executor-cores 5

In this case the executor memory + 384 cannot exceed 10GB max for a yarn scheduler. So in this case 9856M + 384 MB = 10GB so it works fine. Now once the spark shell is up, the total number of executors were 35. The Storage memory as seen in spark shell start logs or Spark UI for each executor is 6.7 GB(i.e. 67% of 10GB).

The top command output for the spark shell process is as below:-

PID     USER      PR    NI  VIRT  RES   SHR S  %CPU %MEM  TIME+  
5256    hdp66-ss  20    0   13.2g 1.1g  25m S  2.6  0.4   1:25.25

So virtual memory is 13.2G and physical memory is 1.1g

  1. spark-shell --master yarn --executor-memory 4096M --num-executors 200 --executor-cores 1

In this case the executor memory + 384 cannot exceed 10GB max for a yarn scheduler. So in this case 4096M + 384 MB = 4GB so it works fine. Now once the spark shell is up, the total number of executors were 200. The Storage memory as seen in spark shell start logs or Spark UI for each executor is 2.7 GB(i.e. 67% of 4GB).

The top command output for the spark shell process is as below:-

PID     USER      PR    NI  VIRT  RES   SHR S  %CPU %MEM  TIME+  
21518   hdp66-ss  20    0   19.2g 1.4g  25m S  3.9  0.6   2:24.46

So virtual memory is 19.2G and physical memory is 1.4g.

So can someone please explain me how these memories and executors are started. Why the memory seen on spark UI is 67% of the executor memory requetsed? And how the virtual and physical memory is decided for each executor.

1

1 Answers

2
votes

Spark almost always allocates 65% to 70% of the memory requested for the executors by a user. This behavior of Spark is due to a SPARK JIRA TICKET "SPARK-12579".

This link is to the scala file located in the Apache Spark Repository that is used to calculate the executor memory among other things.

    if (conf.contains("spark.executor.memory")) {
  val executorMemory = conf.getSizeAsBytes("spark.executor.memory")
  if (executorMemory < minSystemMemory) {
    throw new IllegalArgumentException(s"Executor memory $executorMemory must be at least " +
      s"$minSystemMemory. Please increase executor memory using the " +
      s"--executor-memory option or spark.executor.memory in Spark configuration.")
  }
}
val usableMemory = systemMemory - reservedMemory
val memoryFraction = conf.getDouble("spark.memory.fraction", 0.6)
(usableMemory * memoryFraction).toLong

}

The above code is responsible for the behavior seen by you. This is a safe guard for a scenario where the cluster may not have memory as requested by the user.