Hadoop: MapReduce: Node selection

Question

I have three rack servers each having five nodes from node 1 to node 15, I've written a file file.txt which is separated into four data blocks A,B,C,D and is stashed in node 1 (containing blocks A,B) and node 11 (containing blocks C,D). The job tracker provides the task tracker of the node 1 and 11 with the code to perform map task in its local blocks.

My Question is :

How does the job tracker decide in which node the reduce job has to be performed. Is it because of the rack awareness?
Out of node 2, node 6 and node 12 which would be most optimal node to perform the reduce task, let's assume the nodes are currently not occupied by any task?
Can the reduce task be performed on any of node 1 or node 11 after the map task is over on the nodes.

Thanks in Advance.

Ramana Ramana · Accepted Answer · 2013-11-07T07:55:01

1) Job Tracker may choose Node1 or Node 11 to perform Reduce operation, It may choose the nodes where less data transfer is required.

2) Either of Node2 or Node12

3) Yes. For the Reduce operation to start, all the map outputs need to be copied to node where reduce operation is going to perform. So once Node 1 and Node 11 map functions completed, job tracker may start the reduce operation on Node1 or Node11.

Hope this helps.

Hadoop: MapReduce: Node selection

1 Answers