0
votes

Everyone has known that Name Node can store metadata and every fraction of a second what happen everything stored in Log files. To identify the bugs log files only key factors. Now come to the point by default secondary Namenode can take a backup of metadata from Namenode periodically. Name space image, edit log files' will take a backup for the past one hour (configurable).

Why Secondary Namenode take one hour why it's not taking a backup for every second? Already every fraction of second stored in log file. Why Hadoop takes backup of log file for every fraction of a second? If configured like that any disadvantage? Please let me know deeply.

3
coz SecNameNode provides checkpoint facility not high availability. Just think about the network IO for per second/minute checkpoints. Have a look at wiki.apache.org/hadoop/… - blackSmith

3 Answers

1
votes

Secondary Namenode(SNN) was the first of numerous attempts to reduce NN load and to a certain extent provide H.A. Since then there have been upgrades to SNN like Check Point Node, BackUp Node.

SNN: copies and merges the FSImage and edits.log periodically for faster NN startup times.

Check Point Node: Copies and merges the FSImage & edits.log. It then sends this updated version to the NN to replace the older FSImage.

BackUp Node: This however maintains a back up of all the alterations at the runtime without any delay. To achieve this all the streams are shared with both NN and BackUp Node, merges them both and sends it periodically to the NN for updation of NN's FSImage file. Hence providing the functionality that you ask for.

And as for the disadvantages of copying per second updates from the NN, it will create bottleneck on the Network traffic in a heavily loaded cluster.

Go through the below link to read more: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Secondary_NameNode

0
votes
  • Secondary namenode only the backup of the namenode
  • If namenode fails,entire cluster will fail
  • At that time, we can start secondary namenode as a namenode
  • We may specify the backup timings of secondary namenode
  • It is configurable based on number of transactions and seconds.Refer Secondary Namenode
0
votes

Although checkpointing is configurable based on size or time, it is not advisable to configure it for too frequent or too small sizes. As checkpointing activity performs network activity (transfer of fsImage & editLogs over HTTP) in a cluster. It also consumes CPU on Secondary NN.

So checkpointing should be configured to optimal considering cluster activities (change in fsImage).