I have setup a 10 node HDP platform on AWS. Below is my configuration 2 Servers - Name Node and Standby Name node 7 Data Nodes and each node has 40 vCPU and 160 GB of memory.
I am trying to calculate the number of executors while submitting spark applications and after going through different blogs I am confused on what this parameter actually means.
Looking at the below blog it seems the num executors are the total number of executors across all nodes http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
But looking at the below blog it seems that the num executors is per node or server https://blogs.aws.amazon.com/bigdata/post/Tx578UTQUV7LRP/Submitting-User-Applications-with-spark-submit
Can anyone please clarify and review the below :-
Is the num-executors value is per node or the total number of executors across all the data nodes.
I am using the below calculation to come up with the core count, executor count and memory per executor
Number of cores <= 5 (assuming 5) Num executors = (40-1)/5 = 7 Memory = (160-1)/7 = 22 GB
With the above calculation which would be the correct way
--master yarn-client --driver-memory 10G --executor-memory 22G --num-executors 7 --executor-cores 5
OR
--master yarn-client --driver-memory 10G --executor-memory 22G --num-executors 49 --executor-cores 5
Thanks, Jayadeep
num-executors
has been removed in 2.0+. – zero323