HADOOP HDFS imbalance issue

Question

I have a Hadoop cluster that have 8 machines and all the 8 machines are data nodes. There's a program running on one machine(say machine A) that will create sequence files ( each of the file is about 1GB) in HDFS continuously.

Here's the problem: All of the 8 machines are the same hardware and has the same capacity. When other machines still have about 50% free space on the disks for HDFS, machine A has only 5% left. I checked the block info and found that almost every block has one replica on machine A.

Is there any way to balance the replicas? Thanks.

Remus Rusanu Remus Rusanu · Accepted Answer · 2013-10-29T07:17:46

This is the default placement policy. It works well for the typical M/R pattern, where each HDFS node is also a compute node and the writer machines are uniformly distributed.

If you don't like it, then there is HDFS-385 Design a pluggable interface to place replicas of blocks in HDFS. You need to write a class that implements BlockPlacementPolicy interface, and then set this class in as the dfs.block.replicator.classname in hdfs-site.xml.

HADOOP HDFS imbalance issue

2 Answers