1
votes

How to utilize all cores and memory on the spark standalone cluster below:

Node 1: 4cores 8gb memory
Node 2: 4cores 16gb memory

Currently I can allocate to use:

A) 8 cores and 14 gb of memory by setting:

.config('spark.executor.memory','7g')
.config('spark.executor.cores', '4')

Cores       |  Memory
----------------------------------
4 (4 Used)  |  15.0 GiB (7.0 GiB Used)  
4 (4 Used)  |  7.0 GiB (7.0 GiB Used)

B) To use 6 cores and 21gb of memory by setting:

.config('spark.executor.memory','7g')
.config('spark.executor.cores', '2')

Cores       |  Memory
----------------------------------
4 (4 Used)  |  15.0 GiB (14.0 GiB Used) 
4 (2 Used)  |  7.0 GiB (7.0 GiB Used)

Expected output:

 8 cores 21gb of memory:

    Cores       |  Memory
    ----------------------------------
    4 (4 Used)  |  15.0 GiB (14.0 GiB Used) 
    4 (4 Used)  |  7.0 GiB (7.0 GiB Used)

ref:

  1. What are workers, executors, cores in Spark Standalone cluster?
  2. Spark Standalone Number Executors/Cores Control
  3. How multiple executors are managed on the worker nodes with a Spark standalone cluster?
2

2 Answers

1
votes

In the end the answer is very simple. I tried YARN and I had the same problem as with Standalone. I found that solution for Standalone setting is in the end very simple by settings in $SPARK_HOME/conf/spark-env.sh on the each Node:

For Node 1: (4cores 8gb memory)

SPARK_WORKER_CORES=4
SPARK_WORKER_MEMORY=7g

Node 2: (4cores 16gb memory)

SPARK_WORKER_CORES=8
SPARK_WORKER_MEMORY=14g

And running spark app with setting

.config('spark.executor.memory','7g')
.config('spark.executor.cores', '2')

All 8 cores 21gb of memory is used as a result:

 Cores       |  Memory
    ----------------------------------
    8 (8 Used)  |  14.0 GiB (14.0 GiB Used) 
    4 (4 Used)  |  7.0 GiB (7.0 GiB Used)

Note: In YARN the problem can be solved the same way. Setting in $HADOOP_HOME/etc/hadoop/yarn-site.xml is required to change for parameters below:

   <property>
                <name>yarn.nodemanager.resource.memory-mb</name>
                <value>7168</value>
        </property>

        <property>
                <name>yarn.scheduler.maximum-allocation-mb</name>
                <value>7168</value>
        </property>

        <property>
                <name>yarn.scheduler.minimum-allocation-mb</name>
                <value>512</value>
        </property>

         <property>
                <name>yarn.nodemanager.resource.cpu-vcores</name>
                <value>4</value>
        </property>

        <property>
              <name>yarn.scheduler.minimum-allocation-vcores</name>
              <value>1</value>
        </property>

        <property>
              <name>yarn.scheduler.maximum-allocation-vcores</name>
              <value>2</value>
        </property>
0
votes
  • you have a cluster which has a size of two nodes: so you have to use a resource manager like YARN (IMHO). Otherwise your spark-job will only be executed on your local machine.
  • Use this: Running Spark on YARN
  • Example B) showes that 4 cores are allocated but currently only two are used to do the work. The other two cores are just bored and have nothing to do or finished their work