0
votes

Is there any advantage to starting more than one spark instance (master or worker) on a particular machine/node?

The spark standalone documentation doesn't explicitly say anything about starting a cluster or multiple workers on the same node. It does seem to implicitly conflate that one worker equals one node

Their hardware provisioning page says:

Finally, note that the Java VM does not always behave well with more than 200 GB of RAM. If you purchase machines with more RAM than this, you can run multiple worker JVMs per node. In Spark’s standalone mode, you can set the number of workers per node with the SPARK_WORKER_INSTANCES variable in conf/spark-env.sh, and the number of cores per worker with SPARK_WORKER_CORES.

So aside from working with large amounts of memory or testing cluster configuration, is there any benefit to running more than one worker per node?

1

1 Answers

0
votes

I think the obvious benefit is to improve the resource utilization of the hardware per box without losing performance. In terms of parallelism, one big executor with multiple cores seems to be same with multiple executors with less cores.