Is Stand-by-namenode doing the job of Secondary-namenode also?

Question

Friends, I came to know that in hadoop2 when we configure high availability there is no need to configure a secondary-name-node/checkpoint-node/backup-node. With a new kind of mechanism the availability is given by edits shared among the active and standby namenodes.

My question is, secondary-name-node functionality is to merge the edits file with fsimage file periodically, thus gives 2 benefits in hadoop1 world 1) limits the size of edits file and 2) reduces the time of restart by keeping the fsimage nearly up to date.

Therefore, if High Availability is enabled and if secondary-name-node is not required. Then who will do the stiching of edits with fsimage? or is that step not required now due to some architectural/process changes.

Help me to understand it.

Remus Rusanu Remus Rusanu · Accepted Answer · 2015-12-10T10:48:06

There are two modes of deploying HDFS HA (N.B. this is the current 2.7.1 state, if you land on this post sometime post 2016 things may had changed):

shared NFS, where the Active and Standby NameNode are actually working on the same files (image and log). See HDFS HighAvailability using NFS.
Quorum Journal Manager, where the active and passive NameNode both rely on a new service, a set of minimum 3 JournalNodes that provide a quorum for log edits. See HDFS High Availability Using the Quorum Journal Manager.

For both of these configurations, the documentation explicitly calls out the answer to your question:

Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. This also allows one who is reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled to reuse the hardware which they had previously dedicated to the Secondary NameNode.

Is Stand-by-namenode doing the job of Secondary-namenode also?

1 Answers