0
votes

I was going through High Availability in Hadoop Definitive guide and was unclear with below,

To recover from a failed namenode in this situation, an administrator starts a new primary namenode with one of the filesystem metadata replicas and configures datanodes and clients to use this new namenode. The new namenode is not able to serve requests until it has

(i) loaded its namespace image into memory,

(ii) replayed its edit log, and

(iii) received enough block reports from the datanodes to leave safe mode.

My understanding:-

Initially the primary name node was failed and new name node was started "with one of the filesystem metadata replicas".

Below are the questions i have:-

a.) 'filesystem metadata replicas' mean backup of filesystem in NFS or replicated fs metadata in secondary name? And if not, is it some thing else.

b.) Procedure to started new name node in place of failed primary name node.

c.) how to load namespace image into memory in new name node.

d.) how to replay edit logs in new name node.

e.) How does new name node receives block reports from datanodes.

f.) what is safe mode in hadoop.

h.) is safe mode has different meaning in name node and in datanode.

i.) How to make sure name node received enough block reports.

J.) How to confirm datanode's left safe mode.

1

1 Answers

0
votes

a.) 'filesystem metadata replicas' mean backup of filesystem in NFS or replicated fs metadata in secondary name? And if not, is it some thing else.

either one is fine, they're the same. In fact, you'll find

    -rw-r--r-- 1 root   root    5902401510  5月 25 11:25 fsimage_0000000004135660446
    -rw-r--r-- 1 root   root            62  5月 25 11:25 fsimage_0000000004135660446.md5
    -rw-r--r-- 1 root   root    5904535085  5月 25 13:06 fsimage_0000000004136678683
    -rw-r--r-- 1 root   root            62  5月 25 13:06 fsimage_0000000004136678683.md5
    -rw-r--r-- 1 root   root      37822049  5月 24 22:55 edits_0000000004125929293-0000000004126105088
    -rw-r--r-- 1 root   root       5821392  5月 24 23:01 edits_0000000004126105089-0000000004126140857

The digital number mean transaction Ids. It's better you use the latest one with the largest Id. It decides at what point your Namenode last remember.

b.) Procedure to started new name node in place of failed primary name node.

Make sure you start it in the same node. If there's pysical failure you have to start it in a different machine. Make sure the new machine has the original Hostname or IP address. As long as there's no other NameNode running, you can start the namenode normally using startup script.

c.) how to load namespace image into memory in new name node. d.) how to replay edit logs in new name node.

Namenode process will load fsimage and replay edit logs automatically.

e.) How does new name node receives block reports from datanodes.

Repeat above, Make sure you start it in the same node. If there's pysical failure you have to start it in a different machine. Make sure the new machine has the original Hostname or IP address. If not, you have to change Namenode address in all Datanodes, which is painful.

f.) what is safe mode in hadoop.

Safemode for the NameNode is essentially a read-only mode for the HDFS cluster, where it does not allow any modifications to file system or blocks. It's used to keep your data safe.

h.) is safe mode has different meaning in name node and in datanode.
J.) How to confirm datanode's left safe mode.

Datanode has no "safemode".

i.) How to make sure name node received enough block reports.

As long as you don't shutdown your datanodes, when your Namenode is alive again, all your datanodes will send block reports to it.