I understand that the mapper produces 1 partition per reducer. How does the reducer know which partition to copy? Lets say there are 2 nodes running mapper for word count program and there are 2 reducers configured. If each map node produces 2 partitions, with the possibility of partitions in both the nodes containing same word as key, how will the reducer work correctly?
For ex:
If node 1 produces partition 1 and partition 2, and partition 1 contains a key named "WHO".
If node 2 produces partition 3 and partition 4, and partition 3 contains a key named "WHO".
If Partition 1 and Partition 4 went to reducer 1 (and remaining to reducer 2), how does the reducer 1 compute the correct word count?
If this is not a possibility, and partition 1 and 3 would be made to go to reducer 1, how Hadoop does this? Does it make sure a given key-value pair from different nodes always go to a same reducer? If so, how it does this?
Thanks, Suresh.