1
votes

Just got start with hadoop, got several questions about execution of reducer.

When the key, value pairs distributed to one reducer task. Does it process sequential or parallel.

For example (A,5) (A,3) (B,10) for the reducer task. Does A,B get into reducer in parallel?

1

1 Answers

2
votes

When one reducer is used, the KV pairs are not processed in parallel, but are processed in sorted order. In your example above, the pairs will be sent from one or more mapper tasks (in parallel if multiple mappers) to the single reduce task. Before these values are passed to your reducer class, they are aggregated ((A,5) and (A,3) are turn into (A,{5,3})) and then sorted before the reducer task actually runs user code to 'reduce' the input sets.