Map Reduce in Map Reduce

Question

I develop Map/Reduce using Hadoop. My driver Program submit a MapReduce job (with a Map and Reduce Task) to the Job tracker of Hadoop. I have two questions: a) Can my Map or reduce task submit another MapReduce Job? (with the same cluster Hadoop and to the same Job Tracker). That means, my begining driver program submit a mapreduce job in which, its map or reduce task spawn another MapReduce job and submit it to the same cluster Hadoop and to the same Job Tracker. I think it's possible. But I'am not sure. Moreover, it a good solution? If not, can we have another solution?

b) Can we use two Map tasks (with two different functions and one Reduce task in a MapReduce job? Thanks a lot

What is it you're trying to accomplish by launching MapReduce jobs from within a MapReduce job? — Pradeep Gollakota
I have two input large data sets (set1 and set2). For each record element of set1, I need all elements of sets 2 in order to process it. So I intend to let my driver program submit set1 as input data to mapreduce job. Then, in Map Task, in order to process a record element of set1, I intend to submit another mapReduce job whose input data is set2. I don't know it is possible or not. I think it's possible theoretically but impossible because no slot is available. It is possible if my Map function submit another MapReduce Job to another Hadoop cluster with another JobTracker? — CD Tran

cendrillon cendrillon · Accepted Answer · 2012-03-16T06:31:36

You can certainly chain multiple map stages using the ChainMapper class

You can also setup dependencies between jobs using the JobControl class and addDependingJob() method. This may be preferable to having Map Reduce jobs spawn off other Map Reduce jobs which goes against the fundamental approach of Map Reduce as it will likely cause your solution to no longer be robust against hardware failure on an individual node.

Chapter 5 of Hadoop in Action by Chuck Lam has a good overview of this.

Map Reduce in Map Reduce

5 Answers