0
votes

I've the following doubt while learning Map reduce. It will be of great help if some one could answer.

I've two mappers working on the same file - I configured them using MultipleInputFormat

mapper 1 - Expected Output [ after extracting few columns of a file]

a - 1234
b - 3456
c - 1345

Mapper 2 Expected output [After extracting few columns of the same file]

a - Monday
b - Tuesday
c - Wednesday

And there is a reducer function that just outputs the key and value pair that it gets as input So I expected the output to be as I know that similar keys will be shuffled to make a list.

a - [1234,Monday]
b - [3456, Tuesday]
c - [1345, Wednesday]

But am getting some weird output.I guess only 1 Mapper is getting run. Should this not be expected ? Will the output of each mapper be shuffled separately ? Will both the mappers run parallel ?

Excuse me if its a lame question Please understand that I am new to Hadoop and Map Reduce

Below is the code

//Mapper1
public class numbermapper extends Mapper<Object, Text, Text, Text>{

    public void map(Object key,Text value, Context context) throws IOException, InterruptedException {
        String record = value.toString();
        String[] parts = record.split(",");
        System.out.println("***Mapper number output "+parts[0]+"  "+parts[1]);
        context.write(new Text(parts[0]), new Text(parts[1]));

    }
}

//Mapper2
public class weekmapper extends Mapper<Object, Text, Text, Text> {
    public void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {
        String record = value.toString();
        String[] parts = record.split(",");
        System.out.println("***Mapper week output "+parts[0]+"   "+parts[2]);
        context.write(new Text(parts[0]), new Text(parts[2]));
    }
}

//Reducer
public class rjoinreducer extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Text values, Context context)
    throws IOException, InterruptedException {
   context.write(key, values);

}
}

//Driver class
public class driver {

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = new Job(conf, "Reduce-side join");
        job.setJarByClass(numbermapper.class);
        job.setReducerClass(rjoinreducer.class);
        job.setMapOutputValueClass(Text.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);


        MultipleInputs.addInputPath(job, new Path(args[0]),TextInputFormat.class, numbermapper.class);
        MultipleInputs.addInputPath(job, new Path(args[0]),TextInputFormat.class, weekmapper.class);
        Path outputPath = new Path(args[1]);


        FileOutputFormat.setOutputPath(job, outputPath);
        outputPath.getFileSystem(conf).delete(outputPath);
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

And this is the O/P I got-

a     Monday
b     Tuesday
c     Wednesday

Dataset used

a,1234,Monday
b,3456,Tuesday
c,1345,Wednesday
1
What is your weird output?Mike Park
Can you include a demo of how you're writing the code and what the "weird output" is?Krease
I edited the question to include the I/P O/P and the code that I used . Its just giving the output of the second mapper.Bingo

1 Answers

3
votes

Multiple input format was just taking 1 file and running one mapper on it because I have given the same path for both the Mappers.

When I copy the dataset to a different file and ran the same program taking two different files (same content but different names for the files) I got the expected output.

So i now understood that the output from different mapper functions is also combined based on key , not just the output from the same mapper function.

Thanks for trying to help....!!!