0
votes

I have seen this figure/definition in most books / blogs for Map phase of MapReduce

enter image description here

What I dont understand is in Map phase the input key is k and output is a different key k(dash) , I googled around and just found one trivial example on this http://java.dzone.com/articles/confused-about-mapreduce

I am looking for more example (theoretical) , explanation on same. where the keys are different in input and output of map reduce.

Will appreciate if someone can provide same. Let me know if i need to explain my question further.

enter image description here

1
found one more ... stevekrenzel.com/finding-friends-with-mapreduce , more the merrierLav

1 Answers

0
votes

That's very straight forward actually. The key that is the input for the map phase is the key the source data has and the key going out of the map is the key you want to order by or group by the end result.

It is important to note that the input key depends on the input file format e.g. if it the input is HBase the key would be the HBase key, in a CSV file the key would be the line number

For instance you if you have a sequence file where each line has a key of SSN and a value which is first name a last name and you want the end result to be ordered by last name the k in would be the SSN and you'd emit the lastname concatenated by the first name as the k' to order by it