0
votes

As per my understanding, files stored in HDFS are divided into blocks and and each block is replicated to multiple nodes, 3 by default. How does Hadoop framework choose the node to run a map job, out of all the nodes on which a particular block is replicated.

2

2 Answers

0
votes

As I know, there will be same amounts of map tasks as amounts of blocks.

See manual here.

Usually, framework choose those nodes close to the input block for reducing network bandwidth for map task.

That's all I know.

0
votes

In Mapreduce 1 it depends on how many map task are running in that datanode which hosts a replica, because the number of map tasks is fixed in MR1. In MR2 there are no fixed slots, so it depends on number of tasks already running in that node.