I 'm a bit confused about how master and worker nodes are assigned to the respective connected machines (VMs) on the network in the cluster mode of Spark.
I have two nodes which i do the Hadoop configuration on one (i consider to be the principal node) mapred-set.xml core-site.xml hdfs-site.xml, hadoop-env.sh, worker then the Yarn related configuration files (i chose the resource manager to be Yarn in my case). Under hadoop main folder i set the worker IPs in the worker file. I then replicate the same hadoop whole folder on the second node and set the Hadoop path.
My question is when i launch a Spark job (using Spark-submit) what is the process workflow that is responsible of assigning a master node and a worker node.
On a basic example without Hadoop i would specify explicitly the worker and the master through launching on each machine either start-slave.sh or start-master.sh, but how does Spark / Hadoop assign Worker and Master nodes through Hadoop configuration files mainly ?
Thanks !