I have three rack servers each having five nodes from node 1 to node 15, I've written a file file.txt which is separated into four data blocks A,B,C,D and is stashed in node 1 (containing blocks A,B) and node 11 (containing blocks C,D). The job tracker provides the task tracker of the node 1 and 11 with the code to perform map task in its local blocks.
My Question is :
How does the job tracker decide in which node the reduce job has to be performed. Is it because of the rack awareness?
Out of node 2, node 6 and node 12 which would be most optimal node to perform the reduce task, let's assume the nodes are currently not occupied by any task?
Can the reduce task be performed on any of node 1 or node 11 after the map task is over on the nodes.
Thanks in Advance.