0
votes

On my 3-machine cluster, Hadoop version 2.7.3, data-node utilization has become pretty unbalanced, so I am trying to use hdfs balancer to fix the problem. But the balancer does nothing. Every single iteration goes like this (note that I have hidden the actual IP addresses of the nodes):

Aug 28, 2017 12:12:50 PM 8 0 B 289.99 GB 10 GB

17/08/28 12:12:59 INFO net.NetworkTopology: Adding a new node: /default-rack/[Datanode1Addr]:50010

17/08/28 12:12:59 INFO net.NetworkTopology: Adding a new node: /default-rack/[Datanode2Addr]:50010

17/08/28 12:12:59 INFO net.NetworkTopology: Adding a new node: /default-rack/[Datanode3Addr]:50010

17/08/28 12:12:59 INFO balancer.Balancer: 2 over-utilized: [[Datanode1Addr]:50010:DISK, [Datanode3Addr]:50010:DISK]

17/08/28 12:12:59 INFO balancer.Balancer: 1 underutilized: [[Datanode2Addr]:50010:DISK]

17/08/28 12:12:59 INFO balancer.Balancer: Need to move 289.99 GB to make the cluster balanced.

17/08/28 12:12:59 INFO balancer.Balancer: Decided to move 10 GB bytes from [Datanode1Addr]:50010:DISK to [Datanode2Addr]:50010:DISK

17/08/28 12:12:59 INFO balancer.Balancer: Will move 10 GB in this iteration

...with no data ever getting moved.

Any ideas?

1
What is your replication factor set to? - tk421
It's set to 3: <property> <name>dfs.replication</name> <value>3</value> </property> - AntsySysHack

1 Answers

0
votes

If you have a replication factor of 3 and only 3 nodes in your cluster, then the HDFS Balancer cannot migrate data since you have to maintain 3 copies of the data and HDFS does not replicate data on the same node.