0
votes

Hadoop 2.0 provides High Availability with Federation Architecture. High availability is achieved with above architecture.

I have a doubt regarding "Resource Manager".

The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.

The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system.

We can have Resource Manager, not co-existing with Name Node. Since we have single resource manager, how the architecture addresses High Availability of resource manager?

What will happen if Resource Manager is down or not available?

enter image description here

2

2 Answers

3
votes

In Hadoop 2.X.X, we have High availability for both HDFS and YARN.

NameNode HA for HDFS high availability.

Resource Manager HA (RMHA) for YARN high availability.

In RMHA, we have one primary Resource Manager (active) and one or more stand by Resource Manager(s). This Resource manager HA is coordinated by Zookeeper. If active Resource Manager is down, FailoverControl initiates failover to make stand by as active Resource Manager. So always we can have active Resource Manager. This concept avoids the Single Point Of Failure (SPOF) in Yarn Resource Manager.

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_hag_rm_ha_config.html#concept_xgs_pc5_vl_unique_1

1
votes

This has been remedied as of hadoop v2.4+. Take a look here.