4
votes

I'm using Spark in a YARN cluster (HDP 2.4) with the following settings:

  • 1 Masternode
    • 64 GB RAM (48 GB usable)
    • 12 cores (8 cores usable)
  • 5 Slavenodes
    • 64 GB RAM (48 GB usable) each
    • 12 cores (8 cores usable) each
  • YARN settings
    • memory of all containers (of one host): 48 GB
    • minimum container size = maximum container size = 6 GB
    • vcores in cluster = 40 (5 x 8 cores of workers)
    • minimum #vcores/container = maximum #vcores/container = 1

When I run my spark application with the command spark-submit --num-executors 10 --executor-cores 1 --executor-memory 5g ... Spark should give each executor 5 GB of RAM right (I set memory only to 5g due to some overhead memory of ~10%).

But when I had a look in the Spark UI, I saw that each executor only has 3.4 GB of memory, see screenshot:

screenshot

Can someone explain why there's so less memory allocated?

1

1 Answers

5
votes

The storage memory column in the UI displays the amount of memory used for execution and RDD storage. By default, this equals (HEAP_SPACE - 300MB) * 75%. The rest of the memory is used for internal metadata, user data structures and other stuffs.

You can control this amount by setting spark.memory.fraction (not recommended). See more in Spark's documentation