Running multiple mapreduce jobs with oozie?

Question

As part of a workaround, I wanted to use two mapreduce jobs(instead of one) that ought to run in sequence for giving the desired affect.

The map function in each job simply emit each key,value pair without processing. The reduce functions in each job are different as they do different kind of processing.

I stumbled upon oozie and it seem to directly writes to the input stream of the consequent job (or doesn't it?) - this would be great since the intermediate data is large (I/O operation would become a bottleneck).

How can I achieve this with oozie (2 mr jobs in the workflow)?

I did go through the below resources, but they simply run a single job as a workflow: https://cwiki.apache.org/confluence/display/OOZIE/Map+Reduce+Cookbook

Help appreciated.

Cheers

troutinator troutinator · Accepted Answer · 2012-12-14T14:58:28

There is, look at the ChainMapper class in Hadoop. It allows you to forward the map output of one mapper directly into the input of the next mapper without hitting the disk.

Running multiple mapreduce jobs with oozie?

2 Answers