Reading the paper about MapReduce and there is mention of sorting all intermediate keys to be grouped together.
When a reduce worker has read all intermediate data, it sorts it by the intermediate keys so that all occurrences of the same key are grouped together. The sorting is needed because typically many different keys map to the same reduce task. If the amount of intermediate data is too large to fit in memory, an external sort is used
There there is mention of the same reduce task being exectued on multiple machines.
When a reduce task completes, the reduce worker atomically renames its temporary output file to the final output file. If the same reduce task is executed on multiple machines, multiple rename calls will be executed for the same final output file.
If the same keys are grouped together, won't that become one reduce task to be run by one reduce worker? How can the same reduce task be run on multiple machines?