Hadoop map/reduce sort

Question

I have a map-reduce job and I am using just the mapper because the output of each mapper will definitely have a unique key. My question is when this job is run and I get the output files, which are like part-m-00000, part-m-00001 ... Will they be sorted in order of key?

Or Do I need to implement a reducer which does nothing but just writes them to files like part-r-00000, part-r-000001. And does these guarantee that the output is sorted in the order of the key.

delmet delmet · Accepted Answer · 2012-11-14T21:46:38

If you want to sort the keys within the file and make sure that the keys in the file are less than the keys in file j when i is less than j, you not only need to use a reducer, but also a partitioner. You might want to consider using something like Pig to do this as it will be trivial. If you want to do it with MR, use the sorted field as your key and write a partitioner to make sure that your keys end up in the correct reducer.

Hadoop map/reduce sort

3 Answers