Does secondary namenode also updates metadata stored at NFS?

Question

I am reading "Hadoop: The Definitive guide". This is how author explains fault tolerance before Hadoop 2.x

Without the namenode, the filesystem cannot be used. In fact, if the machine running the namenode were obliterated, all the files on the filesystem would be lost since there would be no way of knowing how to reconstruct the files from the blocks on the datanodes. For this reason, it is important to make the namenode resilient to failure, and Hadoop provides two mechanisms for this. The first way is to back up the files that make up the persistent state of the filesystem metadata. Hadoop can be configured so that the namenode writes its persistent state to multiple filesystems. These writes are synchronous and atomic. The usual configuration choice is to write to local disk as well as a remote NFS mount. It is also possible to run a secondary namenode, which despite its name does not act as a namenode. Its main role is to periodically merge the namespace image with the edit log to prevent the edit log from becoming too large. The secondary namenode usually runs on a separate physical machine because it requires plenty of CPU and as much memory as the namenode to perform the merge. It keeps a copy of the merged name‐ space image, which can be used in the event of the namenode failing. However, the state of the secondary namenode lags that of the primary, so in the event of total failure of the primary, data loss is almost certain. The usual course of action in this case is to copy the namenode’s metadata files that are on NFS to the secondary and run it as the new primary

My understanding is NFS is always synced with primary namenode. My question is how does the metadata stored in NFS gets synced with primary namenode after secondary namenode has updated the metadata of primary namenode? What happens if primary fails totally before NFS gets synced?

If primary namenode fails to create an fsImage of the edit logs, then it will be stopped. Because, it is compulsory for the namenode to create fsImage of edit logs after a period of time. But suppose if secondary namenode is not able to copy te fsimage or edit logs and goes down, then you can manually update them in Sec. Namenode and restart it. — Abhinav

OneCricketeer OneCricketeer · Accepted Answer · 2018-08-21T13:31:56

That document doesn't say the "primary" or Secondary NameNode is necessarily in sync with NFS, it's saying in the event you have configured Namenode backups to NFS (something you must do yourself, I believe, as it says this is a "configuration choice"), you can restore them to a new server and designate it as the new Namenode. Note "despite its name (the secondary namenode) does not act as a namenode", and "the state of the secondary namenode lags that of the primary", therefore it'll never get data that didn't already arrive on the primary, it will checkpoint what's already there.

That quoted section is alluding to having a Standby Namenode, which serves a different purpose than the secondary, and the standby should be in sync

Quoted from that link,

Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error

Does secondary namenode also updates metadata stored at NFS?

1 Answers