In MapReduce, how can the same reduce task be executed on multiple machines?

Question

Reading the paper about MapReduce and there is mention of sorting all intermediate keys to be grouped together.

When a reduce worker has read all intermediate data, it sorts it by the intermediate keys so that all occurrences of the same key are grouped together. The sorting is needed because typically many different keys map to the same reduce task. If the amount of intermediate data is too large to fit in memory, an external sort is used

There there is mention of the same reduce task being exectued on multiple machines.

When a reduce task completes, the reduce worker atomically renames its temporary output file to the final output file. If the same reduce task is executed on multiple machines, multiple rename calls will be executed for the same final output file.

If the same keys are grouped together, won't that become one reduce task to be run by one reduce worker? How can the same reduce task be run on multiple machines?

Ravindra babu Ravindra babu · Accepted Answer · 2016-08-04T06:21:13

. If the same reduce task is executed on multiple machines, multiple rename calls will be executed for the same final output file.

It's possible due to speculative execution.

If a particular Map or Reduce task is taking long time, Hadoop Framework starts same task on different machine speculating that long running task had some issues. The slowness in long running task can be caused by network failures, busy machine or faulty hardware.

You can find more details about this concept in this SE question:

Hadoop speculative task execution

From Apache documentation page @ Task Side-Effect Files:

There could be issues with two instances of the same Mapper or Reducer running simultaneously (for example, speculative tasks) trying to open and/or write to the same file (path) on the FileSystem. Hence the application-writer will have to pick unique names per task-attempt (using the attemptid, say attempt_200709221812_0001_m_000000_0), not just per task.

To avoid these issues the MapReduce framework, when the OutputCommitter is FileOutputCommitter, maintains a special ${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid} sub-directory accessible via ${mapreduce.task.output.dir} for each task-attempt on the FileSystem where the output of the task-attempt is stored.

In MapReduce, how can the same reduce task be executed on multiple machines?

2 Answers