1
votes

Here is my spark cluster details - Memory - 29.3GB and 10 cores.

enter image description here

Now I run this job,

spark-submit --master spark://hadoop-master:7077 --executor-memory 1g -- executor-cores 2 /home/hduser/ratings-counter.py

but when I click the completed application, I see 5 executors being executed.

How does spark determines to execute 5 executors?

enter image description here

2

2 Answers

2
votes

From spark configuration docs :

spark.executor.cores : The number of cores to use on each executor. In standalone and Mesos coarse-grained modes, setting this parameter allows an application to run multiple executors on the same worker, provided that there are enough cores on that worker. Otherwise, only one executor per application will run on each worker.

As you have 10 cores and has set executor-cores as 2, It spawns 5 executors.

0
votes

The issue explained here has to do with the fine tuning. More inforamtion can be found at: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/

To set the number of executors you will need YARN to be turned on. The Number of cores = Concurrent tasks as executor can run (when using hdfs it is advisable to keep this below 5). So for your example we set the --executor-cores to 3, not to 2 as in the comment above by @user1050619. The number of executors would then be 10/3 ~3 . To make sure this is controlled you can use as said by @user1050619 in the comment --num-executors. In your UI in the question above the limit of executors is 5, so it will try to reach this, if there is enough memory. One way to solve this is using dynamic allocation. This allows for a finer grained control. Here the number of max executors can be set with the option: spark.dynamicAllocation.maxExecutors and the initial executors can then also be set to 3 with: spark.dynamicAllocation.initialExecutors.