1
votes

I am reading tutorial for mapreduce with combiners http://www.tutorialspoint.com/map_reduce/map_reduce_combiners.htm

The reducer receives the following input from combiner

<What,1,1,1> <do,1,1> <you,1,1> <mean,1> <by,1> <Object,1>
<know,1> <about,1> <Java,1,1,1>
<is,1> <Virtual,1> <Machine,1>
<How,1> <enabled,1> <High,1> <Performance,1>

My doubt is what if I skip the combiner and allow mapper to pass the output to the reducer without performing any grouping operation ( without using combiner ) and allow it to pass through shuffle and sort phase .

what input will the reducer receive after mapper phase is over and after going through shuffling and sorting phase ?

Can I check what input is received for reducer ?

1

1 Answers

0
votes

I would say that the output your looking at from that tutorial is perhaps a bit wrong. Since it's reusing the code from the reducer as the combine stage the output from the combiner would actually look like:

<What,3> <do,2> <you,2> <mean,1> <by,1> <Object,1>
<know,1> <about,1> <Java,3>
<is,1> <Virtual,1> <Machine,1>
<How,1> <enabled,1> <High,1> <Performance,1>

In this example, you can absolutely not use the combine and the final result will be the same. In a scenario where you have multiple mappers and reducers, the combine would just be doing some local aggregation on the output from the mappers, with the reduce doing the final aggregation.

If you run without the combine, you are still going to get key based groupings at the reduce stage. The combine will just be doing some local aggregation for you on the map output.

The input to the reduce will just be the output written by the mapper, but grouped by key.