How are key/value pairs distributed in Hadoop when using multiple reducers?

Question

Suppose I have a job which has a number of mappers and more than one reduce task. The key type the mapper outputs is WritableComparable. I.e., for the word count example, let's say I have the string:

"foo foo bar foo bletch quux bar"

When using the words as a key, is "foo" always distributed to the same reduce or is it possible that more than one reducer receives a "foo"?

Amar Amar · Accepted Answer · 2014-10-29T05:04:56

As mentioned in the other answers, MapReduce always passes key-value pairs to the reducer such that all values associated with the same key go to the same reducer.

How it really happens is that during the partition stage, a hash function is applied on the keys and hence they are sent to the reducers using the hash. Therefore all similar keys will end up going to the same reducer.

How are key/value pairs distributed in Hadoop when using multiple reducers?

3 Answers