I am using EMR cluster of 1 master and 11 m5.2xlarge core nodes. After doing some related calculations to this type of node, the following json to set my spark application configuration on EMR:
[
{
"Classification": "capacity-scheduler",
"Properties": {
"yarn.scheduler.capacity.resource-calculator":"org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
}
},
{
"Classification": "yarn-site",
"Properties": {
"yarn.nodemanager.vmem-check-enabled":"false",
"yarn.nodemanager.pmem-check-enabled":"false"
}
},
{
"Classification": "spark-defaults",
"Properties": {
"spark.dynamicAllocation.enabled":"false",
"spark.worker.instances":"5",
"spark.driver.memory":"20g",
"spark.executor.memory":"20g",
"spark.executor.cores":"5",
"spark.driver.cores":"5",
"spark.executor.instances":"14",
"spark.yarn.executor.memoryOverhead":"4g",
"spark.default.parallelism":"140"
}
},
{
"classification": "spark",
"properties": {
"maximizeResourceAllocation":"false"
}
}
]
However, the running containers of this cluster are not as i expected (usually the same number of running cores). Just 11 running contaiers there are, how can i increase this number to be 51 as the number of used Vcores?