Actually working on a Big Data project as a final project assignement, I'v been looking for a mean to run HDFS Federation on a fully distributed cluster.
The specifications of my cluster are :
- Hadoop 2.7.2
- JDK 1.8.74
- The OS system is CentOS 6.7
- 2 namenodes (Namenode1 and Namenode2)
- 2 datanodes (Datanode1 and Datanode2)
- 1 Client (configured for ViewFS mount table)
With one namenode, the cluster (1 namenode + 2 datanodes) works fine, all the configurations seem correct.
I couldn't find that many tutorials explaning how to fully configure the HDFS Federation (for running two namenodes that share all the datanodes) not even in the official documentation of Apache Hadoop. The one I used is the following Fully Distributed Hadoop Federation Cluster
My attempts to run effectively the HDFS Federation have failed, even if the dfs daemons have successfully launched, the datanodes are not used by all the namenodes.
Actual situation :
When I start the dfs services (with start-dfs.sh), the Namednode1 use all the datanodes and the Namenode2 use none. Or each namenode use only one unique datanode (Namenode1 uses Datanode1 and Namenode2 uses Datanode2).
The datanodes usage seems random but they are never all used by the two namenodes at the same time (my objective)
If anyone knows how to run the HDFS Federation with several namenodes, you're welcome to help =P Thank you.