4
votes

When I am going through the basic config I came across dfs.namenode.replication.min = 1, what does it mean ?

http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

2

2 Answers

2
votes

Your namenode, depending on what it's doing, can be in one of several states. For example, when it is starting up, it is in safe mode.

At times when your namenode is in safe mode it will use dfs.namenode.replication.min to override the dfs.namenode.replication setting.

Once all blocks are reported by the datanodes, the namenode will leave said state and go back to using the original setting.

0
votes

dfs.namenode.replication.min is the setting for the minimal block replication (source: the Hadoop 2.9 documentation), as opposed to dfs.replication.max and dfs.replication (maximal and resp. default block replication). The minimal block replication defines

the minimum number of replicas that have to be written for a write to be successful

(from: Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale)

So when writing a file, if dfs.namenode.replication.min = 1 a positive acknowledgement signal is be sent as soon as one copy of each block in the file exists. After that, the system will continue to replicate until the default block replication dfs.replication is reached.

The three mentioned replication settings are not relative to the namenode but they are concerned with files replication.

The namenode is a special server that has its own mechanism for guaranteeing availability, for instance by maintaining multiple copies of the filesystem's meta-data (see Metadata Disk Failure in the Hadoop documentation on HDFS Architecture).

Despite these measures, the namenode can be a Single Point Of Failure (SPOF). That's why beginning with version 2.0.0, Hadoop supports HDFS High Availability (HDFS HA) that relies on two copies of the namenode running in parallel.

The HDFS High Availability feature addresses the above problems by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby.

(from: HDFS High Availability Using the Quorum Journal Manager)