0
votes

After experimenting with 2 reducers, reading the HowManyMapsAndReduces from Hadoop Wiki, hadoop: number of reducers remains a constant 4, Hadoop: Number of mappers and reducers and Setting the number of map tasks and reduce tasks I am driven in the conclusion that:

If I have 1 map (I understand that the number gets actually decided by Hadoop) and 2 reducers (where I actually provided only 1 file with the reducer code, e.g. -reducer /bin/wc), then what will happen from the following?

  1. Hadoop will distribute the data the mapper sends to both reducers (e.g. given 1000 lines of text, it will give ~500 to 1st reducer and ~500 to 2nd reducer)?
  2. Hadoop will give all the data the mapper sends to both reducers (e.g. given 1000 lines of text, it will give 1000 to 1st reducer and 1000 to 2nd reducer)?

I think the 1st option, but I could not find evidence while searching the net.

1

1 Answers

4
votes

Option 1a: Hadoop will distribute data to the reducers, but it may not evenly divide it. There is no guarantee of balancing, especially if (1) your key distribution is skewed or (2) there are not a lot of records.