I currently have a task where i need to chain a few jobs in Hadoop. What i am dong right now is that i have 2 jobs. My first job has a map function,a combiner and a reducer. Well i need one more phase of reduce so i created a second job with a simple map task that passes the output of the previous reducer to the final reducer. I find that this is a bit "stupid" because there has to be a way to simply chain this. Moreover i think the I/Os would be decreased that way.
I am using the 0.20.203 version and i only find deprecated examples of ChainMapper and ChainReducer using JobConf. I have found these: http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/lib/chain/ChainMapper.html http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/lib/chain/ChainReducer.html that seems to work with Job class and not with the JobConf which is deprecated in 203, but there isn't any package that contains these classes in 203.