1
votes

How should I add a new datanode to an existing hadoop cluster?

Do I just stop all, set up a new datanode server as existing datanodes, and add the new server IP to the namenode and change the number of slaves to a correct number?

Another question is: After I add a new datanode to the cluster, do I need to do anything to balance all datanodes or "re-distribute" the existing files and directories to different datanodes?

2

2 Answers

3
votes

For the Apache Hadoop you can select one of two options:

1.- Prepare the datanode configuration, (JDK, binaries, HADOOP_HOME env var, xml config files to point to the master, adding IP in the slaves file in the master, etc) and execute the following command inside this new slave:

hadoop-daemon.sh start datanode

2.- Prepare the datanode just like the step 1 and restart the entire cluster.

3.- To redistribute the existing data you need to enable dfs.disk.balancer.enabled in hdfs-site.xml. This enable the HDFS Disk Balancer and you need to configure a plan.

1
votes

You don't need to stop anything to add datanodes, and datanodes should register themselves to the Namenode on their own; I don't recall manually adding any information or needing to restart a namenode to detect datanodes (I typically use Ambari to provision new machines)

You will need to manually run the HDFS balancer in order to spread the data over to the new servers