When I am going through the basic config I came across dfs.namenode.replication.min = 1, what does it mean ?
http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
When I am going through the basic config I came across dfs.namenode.replication.min = 1, what does it mean ?
http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
Your namenode, depending on what it's doing, can be in one of several states. For example, when it is starting up, it is in safe mode.
At times when your namenode is in safe mode it will use dfs.namenode.replication.min to override the dfs.namenode.replication setting.
Once all blocks are reported by the datanodes, the namenode will leave said state and go back to using the original setting.
dfs.namenode.replication.min
is the setting for the minimal block replication (source: the Hadoop 2.9 documentation), as opposed to dfs.replication.max
and dfs.replication
(maximal and resp. default block replication).
The minimal block replication defines
the minimum number of replicas that have to be written for a write to be successful
(from: Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale)
So when writing a file, if dfs.namenode.replication.min = 1
a positive acknowledgement signal is be sent as soon as one copy of each block in the file exists. After that, the system will continue to replicate until the default block replication dfs.replication
is reached.
The three mentioned replication settings are not relative to the namenode
but they are concerned with files replication.
The namenode
is a special server that has its own mechanism for guaranteeing availability, for instance by maintaining multiple copies of the filesystem's meta-data (see Metadata Disk Failure in the Hadoop documentation on HDFS Architecture).
Despite these measures, the namenode
can be a Single Point Of Failure (SPOF). That's why beginning with version 2.0.0, Hadoop supports HDFS High Availability (HDFS HA) that relies on two copies of the namenode
running in parallel.
The HDFS High Availability feature addresses the above problems by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby.
(from: HDFS High Availability Using the Quorum Journal Manager)