I have a 5 worker node cluster with 6 GB of memory each (Spark executor memory is set at 4608 GB).
I have been running out of memory, with Spark telling me that one of my executors was trying to use more that 5.0 GB of memory. If each executor gets 5 GB of memory, then I should have 25 GB overall of memory between my entire cluster.
ExecutorLostFailure (executor 4 exited caused by one of the running tasks)
Reason: Container killed by YARN for exceeding memory limits. 5.0 GB of 5.0
GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
At the beginning of my spark application, when I look at one of my RDDs in the Storage tab (it is the only rdd in the cache at this point), I see:
RDD Name Storage Level Cached Partitions Fraction Cached Size in Memory Size on Disk
myRDD Memory Serialized 1x Replicated 20 100% 3.2 GB 0.0 B
Host On Heap Memory Usage Off Heap Memory Usage Disk Usage
Node 1 643.5 MB (1931.3 MB Remaining) 0.0 B (0.0 B Remaining) 0.0 B
Master 0.0 B (366.3 MB Remaining) 0.0 B (0.0 B Remaining) 0.0 B
Node 2 654.8 MB (1920.0 MB Remaining) 0.0 B (0.0 B Remaining) 0.0 B
Node 3 644.2 MB (1930.6 MB Remaining) 0.0 B (0.0 B Remaining) 0.0 B
Node 4 656.2 MB (1918.6 MB Remaining) 0.0 B (0.0 B Remaining) 0.0 B
Node 5 652.4 MB (1922.4 MB Remaining) 0.0 B (0.0 B Remaining) 0.0 B
This seems to show that each node only has about 2.5 GB of available memory. The storage tab also never gets close to showing 25 GB of cached RDDs before my spark application gets an out of memory error.
How can I find out how much memory is allocated for cached RDDs?