1
votes

According to Apache documentation on Hdfs Federation, the system is scalable through Federation of multiple name nodes in isolation.

Multiple Namenodes/Namespaces

In order to scale the name service horizontally, federation uses multiple independent Namenodes/namespaces. The Namenodes are federated; the Namenodes are independent and do not require coordination with each other. The Datanodes are used as common storage for blocks by all the Namenodes.

Federation

My Only doubt :

I did not see any central coordinator among Name nodes since all are running isolation. So confused on how jobs are getting submitted and processed.

1) If I submit a map-reduce job, which Name Node will process it? OR

2) Is client should be aware of Name node for which job has to be submitted?

If Client is not aware of which name node, there should be some "Master Name node" to take care of assigning job to a particular Name Node.

How does it work?

Thanks in advance.

1

1 Answers

1
votes

Hadoop federation is a part of HDFS. map-reduce program execution etc., is monitored by yarn.

Yarn has a Resource Manager which will process the job. Resource manager can communicate with name nodes (All the three in this case) and get the address where data exists. this is the only point where NameNode comes into picture.

So a client need not submit job to NameNode. He will submit it to Resource manager.