1
votes

Say I have an HDFS cluster (v 2.0.5) containing multiple racks but it was not originally setup with rack awareness. Data has been loaded into it with the default 3x replication. If I now configure HDFS to be rack aware, the three replicas of a block could very well be on the same rack, which is not what I want.

If my cluster is already balanced, would running the HDFS balancer enforce the block replication policy and shuffle blocks around appropriately, i.e. have one block on a rack and two blocks on another rack? From what I have read about it, it seems like if the cluster is balanced it would simply exit the process.

If not, how can I force HDFS to re-replicate the needed blocks to separate racks?

1

1 Answers

1
votes

If you change the rack configuration so that you now have two racks where you only had one before the balancer will automatically determine that blocks with all replicas on the same rack need to be rebalanced. In other words, when the rack configuration changes, it no longer thinks the cluster is balanced (unless by some chance the blocks were magically in the right place after the rack configuration change).