Is it better to use the mapred or the mapreduce package to create a Hadoop Job?

Question

To create MapReduce jobs you can either use the old org.apache.hadoop.mapred package or the newer org.apache.hadoop.mapreduce package for Mappers and Reducers, Jobs ... The first one had been marked as deprecated but this got reverted meanwhile. Now I wonder whether it is better to use the old mapred package or the new mapreduce package to create a job and why. Or is it just dependent on whether you need stuff like the MultipleTextOutputFormat which is only available in the old mapred package?

E.g. Interface Mapper in package org.apache.hadoop.mapred.lib in r0.21.0 is not marked as deprecated while it is marked as deprecated in r0.20.2. — momo13

Praveen Sripati Praveen Sripati · Accepted Answer · 2011-09-29T16:21:10

Functionality wise there is not much difference between the old (o.a.h.mapred) and the new (o.a.h.mapreduce) API. The only significant difference is that records are pushed to the mapper/reducer in the old API. While the new API supports both pull/push mechanism. You can get more information about the pull mechanism here.

Also, the old API has been un-deprecated since 0.21. You can find more information about the new API here.

As you mentioned some of the classes (like MultipleTextOutputFormat) have not been migrated to the new API, due to this and the above mentioned reason it's better to stick to the old API (although a translation is usually quite simple).

Is it better to use the mapred or the mapreduce package to create a Hadoop Job?

3 Answers