When I look at my Hadoop screen, I see stats like
Average Map Time 5mins, 56sec
Average Shuffle Time 6mins, 27sec
Average Merge Time 4mins, 25sec
Average Reduce Time 3mins, 51sec
From what I understand, MapReduce works something like
- Map step: Use "mapper" machines to apply some transformation to each line of input, which outputs a key-value pair for each line.
- Shuffle step: Take these key-value pairs, and group together pairs with the same key, assigning pairs with the same key to the same "reducer" machine.
- Reduce step: Apply a "reduce" transformation on all pairs with the same key, to produce one result for each group.
So I think I know what "map", "shuffle", and "reduce" are. But what is "merge?"
group together pairsthat exactly involves merging ;) - Thomas Jungblut