1
votes

I am using EMR cluster of 1 master and 11 m5.2xlarge core nodes. After doing some related calculations to this type of node, the following json to set my spark application configuration on EMR:

[
    {
        "Classification": "capacity-scheduler",
        "Properties": {
            "yarn.scheduler.capacity.resource-calculator":"org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
        }
    },

 {
        "Classification": "yarn-site",
        "Properties": {
           "yarn.nodemanager.vmem-check-enabled":"false",
           "yarn.nodemanager.pmem-check-enabled":"false"
                 }
               },
    {
        "Classification": "spark-defaults",
        "Properties": {
            "spark.dynamicAllocation.enabled":"false",
        "spark.worker.instances":"5",
            "spark.driver.memory":"20g",
            "spark.executor.memory":"20g",
            "spark.executor.cores":"5",
            "spark.driver.cores":"5",
            "spark.executor.instances":"14",
            "spark.yarn.executor.memoryOverhead":"4g",
            "spark.default.parallelism":"140"
        }
    },
  {
    "classification": "spark",
    "properties": {
      "maximizeResourceAllocation":"false"
    }
  }
]

However, the running containers of this cluster are not as i expected (usually the same number of running cores). Just 11 running contaiers there are, how can i increase this number to be 51 as the number of used Vcores?

1
Can you clarify what the desired configuration of the cluster is? It would also be helpful to know why you want a particular configuration, instead of dynamic allocation or the EMR optimum.Dave
I want to increase the running containers of emr cluster from 1 container to 5 containers per node. I want to use more vcores since the dynamic allocation just allocate 2 vcores. Any idea? @DaveYousef
Server Fault might be a better place for this question.Frank Merrow

1 Answers

2
votes

The instance type m5.2xlarge has 8 vCPUs and 32G RAM. You could do 4 executors per node at 2 vCPUs and 7G per executor, for a total of 44 executors. This would leave you 4G overhead on each worker node, which should be plenty.

Your spark-defaults config should be thus:

    {
        "Classification": "spark-defaults",
        "Properties": {
            "spark.dynamicAllocation.enabled":"false",

            "spark.executor.instances":"44",
            "spark.executor.cores":"2",
            "spark.executor.memory":"7g"
        }
    },