I want understand how memory is used in the reduce phase of a MapReduce Job, so I can control the settings in the designated way.
If I understand correctly, the reducer first fetches its map output and leaves them in memory up to a certain threshold. The settings to control this are:
- mapreduce.reduce.shuffle.merge.percent: The usage threshold at which an in-memory merge will be initiated, expressed as a percentage of the total memory allocated to storing in-memory map outputs, as defined by mapreduce.reduce.shuffle.input.buffer.percent.
- mapreduce.reduce.input.buffer.percent: The percentage of memory- relative to the maximum heap size- to retain map outputs during the reduce. When the shuffle is concluded, any remaining map outputs in memory must consume less than this threshold before the reduce can begin.
Next, these spilled blocks are merged. It seems the following option controls how much memory is used for the shuffle:
- mapreduce.reduce.shuffle.input.buffer.percent: The percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle.
But then, there is the setting:
- mapreduce.reduce.shuffle.memory.limit.percent: Maximum percentage of the in-memory limit that a single shuffle can consume.
But it is not clear to what value this percentage applies. Is there more information available regarding these values, i.e. what they control and how they differ?
Finally, after the merge completes, the reduce process is ran on the inputs. In the [Hadoop book][1], I found that the final merge-step directly feeds the reducers. But, the default value for mapreduce.reduce.input.buffer.percent=0 contradicts this, indicating that everything is spilled to disk BEFORE the reducers start. Is there any reference on which one of these explanations is correct?
[1]: Hadoop, The definitive guide, Fourth edition, p. 200