how to tune up spark Jobs on a cluster with different amount of memory and cores

Question

I'm working on a spark project and i'm using a hadoop cluster of 3 nodes with the following configuration:

8cores and 16go of Ram (Namenode, Application Master, nodemanager and spark master and worker).
4cores and 8go of Ram (datanode, nodemanager and worker)
4cores and 4go of Ram (datanode, nodemanager and worker) so i'm using the following configuration :

pyspark --master yarn-client --driver-memory 3g --executor-memory 1g --num-executors 3 --executor-cores 1

What's the best amount of executor, memory and cores tu use All my cluster performance?

Avishek Bhattacharya Avishek Bhattacharya · Accepted Answer · 2018-04-22T18:11:59

This essentially boils down to how much you need to process the data. If you have the whole cluster to process data you can use completely.

pyspark --master yarn-client --driver-memory 3g --executor-memory 1g --num-executors 3 --executor-cores 1

Here you aren't using the complete cluster. You are using 3gb driver and 1 gb executors with 3 executors meaning total 3gb of memory whereas you have 12 Gb memory in the cluster and 8 cores. One alternate configuration you could try

pyspark --master yarn-client --driver-memory 8g --executor-memory 3g --num-executors 4 --executor-cores 3

This uses the complete cluster.

However, the executor-memory configuration is mostly based on the job requirement. You need to tune that with multiple try. You can check this document for tuning.

how to tune up spark Jobs on a cluster with different amount of memory and cores

2 Answers