2
votes

I have a 5 worker node cluster with 6 GB of memory each (Spark executor memory is set at 4608 GB).

I have been running out of memory, with Spark telling me that one of my executors was trying to use more that 5.0 GB of memory. If each executor gets 5 GB of memory, then I should have 25 GB overall of memory between my entire cluster.

ExecutorLostFailure (executor 4 exited caused by one of the running tasks) 
Reason: Container killed by YARN for exceeding memory limits. 5.0 GB of 5.0 
GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

At the beginning of my spark application, when I look at one of my RDDs in the Storage tab (it is the only rdd in the cache at this point), I see:

RDD Name    Storage Level                   Cached Partitions   Fraction Cached Size in Memory  Size on Disk
myRDD       Memory Serialized 1x Replicated 20                  100%     3.2 GB 0.0 B

Host    On Heap Memory Usage            Off Heap Memory Usage   Disk Usage
Node 1  643.5 MB (1931.3 MB Remaining)  0.0 B (0.0 B Remaining) 0.0 B
Master  0.0 B (366.3 MB Remaining)      0.0 B (0.0 B Remaining) 0.0 B
Node 2  654.8 MB (1920.0 MB Remaining)  0.0 B (0.0 B Remaining) 0.0 B
Node 3  644.2 MB (1930.6 MB Remaining)  0.0 B (0.0 B Remaining) 0.0 B
Node 4  656.2 MB (1918.6 MB Remaining)  0.0 B (0.0 B Remaining) 0.0 B
Node 5  652.4 MB (1922.4 MB Remaining)  0.0 B (0.0 B Remaining) 0.0 B

This seems to show that each node only has about 2.5 GB of available memory. The storage tab also never gets close to showing 25 GB of cached RDDs before my spark application gets an out of memory error.

How can I find out how much memory is allocated for cached RDDs?

1

1 Answers

1
votes

While submitting a job, you can specify the parameter spark.memory.storageFraction. The default value for this is 0.5.

So, in your case where you are allocating 5G memory for executors, 2.5G will be reserved for caching and the remaining 2.5G will be used for execution.

From Memory Management:

spark.memory.storageFraction

Amount of storage memory immune to eviction, expressed as a fraction of the size of the region set aside by s​park.memory.fraction. The higher this is, the less working memory may be available to execution and tasks may spill to disk more often. Leaving this at the default value is recommended. For more detail, see this description.