How to increase the number of running containers of spark application using EMR cluster?

Question

I am using EMR cluster of 1 master and 11 m5.2xlarge core nodes. After doing some related calculations to this type of node, the following json to set my spark application configuration on EMR:

[
    {
        "Classification": "capacity-scheduler",
        "Properties": {
            "yarn.scheduler.capacity.resource-calculator":"org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
        }
    },

 {
        "Classification": "yarn-site",
        "Properties": {
           "yarn.nodemanager.vmem-check-enabled":"false",
           "yarn.nodemanager.pmem-check-enabled":"false"
                 }
               },
    {
        "Classification": "spark-defaults",
        "Properties": {
            "spark.dynamicAllocation.enabled":"false",
        "spark.worker.instances":"5",
            "spark.driver.memory":"20g",
            "spark.executor.memory":"20g",
            "spark.executor.cores":"5",
            "spark.driver.cores":"5",
            "spark.executor.instances":"14",
            "spark.yarn.executor.memoryOverhead":"4g",
            "spark.default.parallelism":"140"
        }
    },
  {
    "classification": "spark",
    "properties": {
      "maximizeResourceAllocation":"false"
    }
  }
]

However, the running containers of this cluster are not as i expected (usually the same number of running cores). Just 11 running contaiers there are, how can i increase this number to be 51 as the number of used Vcores?

Can you clarify what the desired configuration of the cluster is? It would also be helpful to know why you want a particular configuration, instead of dynamic allocation or the EMR optimum. — Dave
I want to increase the running containers of emr cluster from 1 container to 5 containers per node. I want to use more vcores since the dynamic allocation just allocate 2 vcores. Any idea? @Dave — Yousef

Dave Dave · Accepted Answer · 2020-02-26T00:43:36

The instance type m5.2xlarge has 8 vCPUs and 32G RAM. You could do 4 executors per node at 2 vCPUs and 7G per executor, for a total of 44 executors. This would leave you 4G overhead on each worker node, which should be plenty.

Your spark-defaults config should be thus:

    {
        "Classification": "spark-defaults",
        "Properties": {
            "spark.dynamicAllocation.enabled":"false",

            "spark.executor.instances":"44",
            "spark.executor.cores":"2",
            "spark.executor.memory":"7g"
        }
    },

How to increase the number of running containers of spark application using EMR cluster?

1 Answers