0
votes

Suppose I have a job which has a number of mappers and more than one reduce task. The key type the mapper outputs is WritableComparable. I.e., for the word count example, let's say I have the string:

"foo foo bar foo bletch quux bar"

When using the words as a key, is "foo" always distributed to the same reduce or is it possible that more than one reducer receives a "foo"?

3

3 Answers

0
votes

As mentioned in the other answers, MapReduce always passes key-value pairs to the reducer such that all values associated with the same key go to the same reducer.

How it really happens is that during the partition stage, a hash function is applied on the keys and hence they are sent to the reducers using the hash. Therefore all similar keys will end up going to the same reducer.

0
votes

MapReduce always passes key-value pairs to the reducer such that all values associated with the same key go to the same reducer. This is done by the partitioner phase during MapReduce.

Therefore, all values associated with foo will go to the same reducer.

0
votes

Hadoop performs a sort of the outputs from all map tasks and then transmits all mapper outputs with the same key to the same reducer task. This is called the "shuffle". So one reducer task may be processing all "foo" mapper results while another is processing all "bar" mapper results. If "foo" or "bar" are keys emitted by the mapper, then more than one reducer will never receive "foo".