What is the workflow that Hadoop uses to assign Master and Worker nodes through the Hadoop Configuration files?

Question

I 'm a bit confused about how master and worker nodes are assigned to the respective connected machines (VMs) on the network in the cluster mode of Spark. I have two nodes which i do the Hadoop configuration on one (i consider to be the principal node) mapred-set.xml core-site.xml hdfs-site.xml, hadoop-env.sh, worker then the Yarn related configuration files (i chose the resource manager to be Yarn in my case). Under hadoop main folder i set the worker IPs in the worker file. I then replicate the same hadoop whole folder on the second node and set the Hadoop path.

My question is when i launch a Spark job (using Spark-submit) what is the process workflow that is responsible of assigning a master node and a worker node. On a basic example without Hadoop i would specify explicitly the worker and the master through launching on each machine either start-slave.sh or start-master.sh, but how does Spark / Hadoop assign Worker and Master nodes through Hadoop configuration files mainly ?

Thanks !

Matt Andruff Matt Andruff · Accepted Answer · 2021-10-19T18:31:24

The Driver and Executors requests containers from yarn to launch and do work. Yarn takes care of the allocations for you so you don't need to worry about where the master(driver)/slave(executor) are allocated.

What is the workflow that Hadoop uses to assign Master and Worker nodes through the Hadoop Configuration files?

1 Answers