YARN "Memory Used" in RM UI twice as spark-shell requested

Question

spark-shell started using:

spark-shell --master yarn --executor-memory 4G --num-executors 100

I'm expecting yarn to assign about 400GB memory to spark-shell, but when I go to RM UI, it's showing about 804GB increment of "Memory Used".

I'm running HDP 2.5 with yarn.scheduler.minimum-allocation-mb set to 4096 in yarn-site.xml.

Confused about how this is happening.

It turns out to be a problem about spark memory overhead and yarn memory allocation mechinism, check:

http://www.wdong.org/spark-on-yarn-where-have-all-the-memory-gone.html

Rule 1. Yarn always rounds up memory requirement to multiples of yarn.scheduler.minimum-allocation-mb, which by default is 1024 or 1GB. That’s why the driver’s requirement of 4G+384M showed up as 5G in Yarn. The parameter yarn.scheduler.minimum-allocation-mb is really “minimum-allocation-unit-mb”. This can be easily verified by setting the parameter to a prime number, such as 97, and see Yarn allocate by multiples of the number.

Rule 2. Spark adds an overhead to SPARK_EXECUTOR_MEMORY/SPARK_DRIVER_MEMORY before asking Yarn for the amount.

That is, in my configuration, even spark-shell asks for only 384MB overhead memory, which is default for spark.yarn.executor.memoryOverhead, yarn will allocate another 4GB for it.

Jacek Laskowski Jacek Laskowski · Accepted Answer · 2017-05-21T10:50:20

My understanding is that --executor-memory gives you the upper bound for the memory an executor could use ever and becomes an executor's -Xmx (of the JVM).

In a particular point in time, it may use less (which is good).

I'd go to YARN nodes and do jps -lm and see the properties the executor JVMs use.

YARN "Memory Used" in RM UI twice as spark-shell requested

2 Answers