Adding New Datanodes to An Existing Hadoop Cluster

Question

How should I add a new datanode to an existing hadoop cluster?

Do I just stop all, set up a new datanode server as existing datanodes, and add the new server IP to the namenode and change the number of slaves to a correct number?

Another question is: After I add a new datanode to the cluster, do I need to do anything to balance all datanodes or "re-distribute" the existing files and directories to different datanodes?

chac chac · Accepted Answer · 2018-08-15T02:55:23

For the Apache Hadoop you can select one of two options:

1.- Prepare the datanode configuration, (JDK, binaries, HADOOP_HOME env var, xml config files to point to the master, adding IP in the slaves file in the master, etc) and execute the following command inside this new slave:

hadoop-daemon.sh start datanode

2.- Prepare the datanode just like the step 1 and restart the entire cluster.

3.- To redistribute the existing data you need to enable dfs.disk.balancer.enabled in hdfs-site.xml. This enable the HDFS Disk Balancer and you need to configure a plan.

Adding New Datanodes to An Existing Hadoop Cluster

2 Answers