Modifying the block placement strategy of HDFS

Question

I would like to modify the default block placement strategy of HDFS to suit my application.
For example, I have two files say file1(128MB) and file2(128MB). Having the block size as 64MB, each of the files would be split into two blocks.

I want to make sure that block1 of both file1 and file2 are placed on the same datanode. If possible i would also like to make sure that the replicas are also placed on the same set of datanodes.

Question 1. Is this possible? If so, Which classes in the source code need to be modified?

Question 2. How are commands such as copyFromLocal mapped to functions in the hadoop source code?

Praveen Sripati Praveen Sripati · Accepted Answer · 2013-01-24T12:00:42

The default behaviour of the block placement policy can be modified by extending the BlockPlacementPolicy interface and pointing the class to the dfs.block.replicator.classname property in the Hadoop configuration files.

Hadoop operations are not tied to a particular node, this makes the Hadoop more resilient to the inherent problems in distributed computing. What is the requirement for having blocks for two files on a particular node? With the requirement known a better solution can be found.

Modifying the block placement strategy of HDFS

1 Answers