I was running an application on AWS EMR-Spark. Here, is the spark-submit job;-
Arguments : spark-submit --deploy-mode cluster --class com.amazon.JavaSparkPi s3://spark-config-test/SWALiveOrderModelSpark-1.0.assembly.jar s3://spark-config-test/2017-08-08
AWS uses YARN for resource management. I was looking at the metrics (screenshot below), and have a doubt regarding the YARN 'container' metrics.
Here, the container allocated is shown as 2. However, I was using 4 nodes (3 slave + 1 master),all 8 cores CPU. So, how are only 2 container allocated?
capacity-scheduler.xml
with this setting:"yarn.scheduler.capacity.resource-calculator":"org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
? And you need to specify the number of executors you want. YARN will not automatically utilize your entire cluster. – Glennie Helles Sindholt