1
votes

Friends, I came to know that in hadoop2 when we configure high availability there is no need to configure a secondary-name-node/checkpoint-node/backup-node. With a new kind of mechanism the availability is given by edits shared among the active and standby namenodes.

My question is, secondary-name-node functionality is to merge the edits file with fsimage file periodically, thus gives 2 benefits in hadoop1 world 1) limits the size of edits file and 2) reduces the time of restart by keeping the fsimage nearly up to date.

Therefore, if High Availability is enabled and if secondary-name-node is not required. Then who will do the stiching of edits with fsimage? or is that step not required now due to some architectural/process changes.

Help me to understand it.

1

1 Answers

1
votes

There are two modes of deploying HDFS HA (N.B. this is the current 2.7.1 state, if you land on this post sometime post 2016 things may had changed):

For both of these configurations, the documentation explicitly calls out the answer to your question:

Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. This also allows one who is reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled to reuse the hardware which they had previously dedicated to the Secondary NameNode.