map->map->reduce->reduce->final output

Question

Recently I read a paper that proposed algorithm for mining Maximum Contiguous patterns from DNA data. The proposed method, which sounds pretty interesting, used the following model of MapReduce. map->map->reduce->reduce. That is, First map phase is executed and its output is input to the second phase map. The second phase map's output is input to the first phase reduce. The output of the first phase reduce is input to the second phase reduce and finally the results are flushed into HDFS. Although it seems like an interesting method, the paper didn't mention how they have implemented it. My question is, how do you implement this sort of MapReduce chaining?

Thanks. I didn't actually know how to accept a question:) I tried to "vote up" but coudldn't — Ahmedov

Hari Menon Hari Menon · Accepted Answer · 2012-03-19T22:52:40

In Hadoop, as far as I know, you cannot do this as of now.

One approach can be to use ChainMapper to do the map->map->reduce part. Then, send the result of this job to another job, and set the mapper to IdentityMapper and the reducer to the second phase reducer that you have.

map->map->reduce->reduce->final output

3 Answers