I have a question about the name node High Availability. Name node is so important because it stores all the metadata, if it is down, the whole Hadoop Cluster will be down as well. So is there any good way to approach the name node High Availability, for example there is backup name node that can take over when the primary name node fails? (now I use Hadoop 1.1.2)
1 Answers
For ASF Hadoop 1.1.2, there are no solid NameNode HA options. These were released for 2.0 and are included in popular distributions like Cloudera's CDH4.
The options for NameNode HA include running a primary NameNode and a hot standby NameNode. They share an edits log, either on a NFS mount, or through quorum journal mode in HDFS itself. The former gives you the benefit of having an external source for storing your HDFS metadata, while the latter gives you the benefit of having no dependencies external to Hadoop.
Personally, I like the NFS option, as you can easily snapshot/backup the data resident the file server. The disadvantage to this approach is potentially inconsistent performance in terms of latency.
For more detail, check out the following articles: