7
votes

Imagine two scenarios in EMR:

  1. Running a spark job in local mode on a single node.

  2. Running the same job on a small two node cluster (master and slave) in cluster mode.

My question is: are these two jobs going to take a similar amount of time to finish?

Because from what I understand, the master node doesn't execute any tasks itself, is that true? Is it possible to "enable" tasks to be run in the master node for small clusters?

1

1 Answers

0
votes

to answer your first question, in given scenario performance depends on the number of executors you are running in both a single node and two node cluster.

if no. of executor remain the same in both cases, you will get almost the same performance. There will be slight differences because in 2 node cluster there will be more network and scheduler overhead, which is the bare minimum.

in single node cluster, you will be having all the driver, cluster manager and your executors running in the same node. it means the same single node is acting like master node and worker node and running driver and executors in the same machine and executing tasks