i wanna try to implement this paper work, that i got from IEEE "Location-Aware MapReduce in Virtual Cloud". here the summary: 8 physical machines, each machine containing 4 virtual machines, each VM is installed hadoop hdfs. suppose we have cluster containing p physical machines, each has a harddisk and replica number is 3. then n file blocks are put into the cluster from another computer out of the cluster or generated randomly in cluster. the model is about data pattern generation and task pattern generation with a certain data pattern. each block has the same probability to be placed on physical machines that host the same number virtual machines. a data pattern may occurs, using hadoop strategy, a file block replica all stack in one physical machine, since hadoop's strategy data allocation is random. http://imageshack.us/photo/my-images/42/allstack.png/
the proposed strategy is round-robin allocation and serpentine allocation, like this in theory: http://imageshack.us/photo/my-images/43/proposed.png/
how to make hadoop aware that some number virtual machines is on one physical machines??
to make hadoop not to replicate 2nd and 3rd replica of a file block onto virtual machines that on same physical machine??? i have asked about how to implement like that, and got reply, it is using rack awareness configuration. but i still confused and need more references about that.
how could i trace those data, those file blocks replication distributed evenly on physical machines, ensuring that there aren't file block replicas all stack on one physical machine ?? is it sure if i config following that rack awareness, file blocks replicas distributed evenly on physical machines??