1
votes

spark cluster has 2 worker nodes. Node 1: 64 GB, 8 cores. Node 2: 64 GB, 8 cores.

Now if i submit a spark job using spark-submit in cluster mode with 2 executors and each executor memory as 32 GB, 4 cores/executor. Now my question is, as the above configuration can be accommodated in a single node itself, will spark run it using 2 worker nodes or just in one node?

Also, if a configuration doesn't have a multiple of cores as the executors then how many cores allocated for each executor? Example: if num of cores in a node available after excluding one core for yarn deamon are 7. since 2 nodes, 2*7=14 (total cores available)and as HDFS give good throughput if num of cores per executor were 5.. Now 14/5 to find the num of executors. should i consider 14/5 as 2 or 3 exeutors? then how these cores are equally distributed?

1

1 Answers

0
votes

It is more of a resource manager question then a Spark question, but in your case the 2 executors cant run in a single machine cause the OS has an overhead that uses at least 1 core and 1GB RAM , even if you will set the ram to 30 GB and 3 cores/executor. they will run on different nodes because Spark tries to get the best data locality it can so obviously it wont use the same node for 2 executors.