1
votes

I have a 10 node cluster, 8 DNs(256 GB, 48 cores) and 2 NNs. I have a spark sql job being submitted to the yarn cluster. Below are the parameters which I have used for spark-submit. --num-executors 8 \ --executor-cores 50 \ --driver-memory 20G \ --executor-memory 60G \ As can be seen above executor-memory is 60GB, but when I check Spark UI is shows 31GB. enter image description here

1) Can anyone explain me why it is showing 31GB instead of 60GB. 2) Also help in setting optimal values for parameters mentioned above.

1

1 Answers

0
votes

I think,

Memory allocated gets divided into two parts: 1. Storage (caching dataframes/tables) 2. Processing (the one you can see)

31gb is the memory available for processing. Play around with spark.memory.fraction property to increase/decrease the memory available for processing.

I would suggest to reduce the executor cores to about 8-10

My configuration :

spark-shell --executor-memory 40g --executor-cores 8 --num-executors 100 --conf spark.memory.fraction=0.2