1
votes

My understanding is, in mapreduce programming model we have map and reduce are the two phases. After completing the map phase intermediate (key, values) values are generated and these are passed to reducers.

My doubt is after map() phase the shuffle and sort will come. So, i feel that shuffle and sort are part of reducer phase, is it true ?

if that is the case how combiner() is working ?

1

1 Answers

1
votes

In fact, there is three phases in map/reduce :

  1. map
  2. shuffle & sort
  3. reduce

Shuffle & sort is a framework-only phase (as a developer, you only code the map and the reduce functions) that allow the communication between the map tasks and the reduce phases.

A combiner is and optional phase that can be used before the reduce phase to summerize the output of the map phase so there will be less job for the reduce phase. See more info here : http://www.tutorialspoint.com/map_reduce/map_reduce_combiners.htm

See also this overview of map/reduce architecture : https://developer.yahoo.com/hadoop/tutorial/module4.html#dataflow