I have some flight data (each line containing origin, destination, flight number, etc) and I need to process it to output flight details between all origins and destinations with one stopover, my idea is to have two mappers (one outputs destination as key and the other outputs origin as key, therefore the reducer gets the stopover location as key and all origin and destination as an array of values). Then I can output flight details with one stopover for all locations in the reducer.
So my question is how do I run two different mappers on the same input file and have their output sent to one reducer.
I read about MultipleInputs.addInputPath, but I guess it needs input to be different (or atleast two copies of the same input).
I am thinking of running the two mapper jobs independently using a workflow and then a third Identity mapper and reducer where I will do the flight calculation.
Is there a better solution that this? (Please do not ask me to use Hive, am not comfortable with it yet) Any guidance on implementing using mapreduce would really help. Thanks.