I would like to modify the default block placement strategy of HDFS to suit my application.
For example, I have two files say file1(128MB) and file2(128MB). Having the block size as 64MB, each of the files would be split into two blocks.
I want to make sure that block1 of both file1 and file2 are placed on the same datanode. If possible i would also like to make sure that the replicas are also placed on the same set of datanodes.
Question 1. Is this possible? If so, Which classes in the source code need to be modified?Question 2. How are commands such as copyFromLocal mapped to functions in the hadoop source code?