How to utilize all cores and memory on spark stand alone cluster, where nodes differs in memory size

Question

How to utilize all cores and memory on the spark standalone cluster below:

Node 1: 4cores 8gb memory
Node 2: 4cores 16gb memory

Currently I can allocate to use:

A) 8 cores and 14 gb of memory by setting:

.config('spark.executor.memory','7g')
.config('spark.executor.cores', '4')

Cores       |  Memory
----------------------------------
4 (4 Used)  |  15.0 GiB (7.0 GiB Used)  
4 (4 Used)  |  7.0 GiB (7.0 GiB Used)

B) To use 6 cores and 21gb of memory by setting:

.config('spark.executor.memory','7g')
.config('spark.executor.cores', '2')

Cores       |  Memory
----------------------------------
4 (4 Used)  |  15.0 GiB (14.0 GiB Used) 
4 (2 Used)  |  7.0 GiB (7.0 GiB Used)

Expected output:

 8 cores 21gb of memory:

    Cores       |  Memory
    ----------------------------------
    4 (4 Used)  |  15.0 GiB (14.0 GiB Used) 
    4 (4 Used)  |  7.0 GiB (7.0 GiB Used)

ref:

Dan Dan · Accepted Answer · 2021-01-13T21:34:33

In the end the answer is very simple. I tried YARN and I had the same problem as with Standalone. I found that solution for Standalone setting is in the end very simple by settings in $SPARK_HOME/conf/spark-env.sh on the each Node:

For Node 1: (4cores 8gb memory)

SPARK_WORKER_CORES=4
SPARK_WORKER_MEMORY=7g

Node 2: (4cores 16gb memory)

SPARK_WORKER_CORES=8
SPARK_WORKER_MEMORY=14g

And running spark app with setting

.config('spark.executor.memory','7g')
.config('spark.executor.cores', '2')

All 8 cores 21gb of memory is used as a result:

 Cores       |  Memory
    ----------------------------------
    8 (8 Used)  |  14.0 GiB (14.0 GiB Used) 
    4 (4 Used)  |  7.0 GiB (7.0 GiB Used)

Note: In YARN the problem can be solved the same way. Setting in $HADOOP_HOME/etc/hadoop/yarn-site.xml is required to change for parameters below:

   <property>
                <name>yarn.nodemanager.resource.memory-mb</name>
                <value>7168</value>
        </property>

        <property>
                <name>yarn.scheduler.maximum-allocation-mb</name>
                <value>7168</value>
        </property>

        <property>
                <name>yarn.scheduler.minimum-allocation-mb</name>
                <value>512</value>
        </property>

         <property>
                <name>yarn.nodemanager.resource.cpu-vcores</name>
                <value>4</value>
        </property>

        <property>
              <name>yarn.scheduler.minimum-allocation-vcores</name>
              <value>1</value>
        </property>

        <property>
              <name>yarn.scheduler.maximum-allocation-vcores</name>
              <value>2</value>
        </property>

How to utilize all cores and memory on spark stand alone cluster, where nodes differs in memory size

2 Answers