I am trying to write a data join Map Reduce job in hadoop. I feel I am close but am having an issue preventing map1 from feeding in to map2.
I have two mappers and a single reduce and am trying to force Map1 to read from one file while forcing Map2 to read from another. I would like to parse the results in the reducer to format the join output.
I know by default when chaining mappers in a job the output of a job will be the input of the next, I know this can be overridden but am not successful. The data from map1 is confirmed to be feeding into map2.
This is how I thought I was supposed to specify the input path of a single mapper:
//Setting Configuration for map2
JobConf map2 = new JobConf(false);
String[] map2Args = new GenericOptionsParser(map2, args).getRemainingArgs();
FileInputFormat.setInputPaths(map2, new Path(map2Args[1]));
ChainMapper.addMapper( conf,
Map2.class,
LongWritable.class,
Text.class,
Text.class,
Text.class,
true,
map2);
conf
is the main job configuration and args
consists of 3 values. 1st value is an input file, 2nd value is an input file, 3rd value is the intended output file.
What is the correct way to specify an input path for an individual mapper which is not the first when dealing with data joins and multiple mappers in hadoop?