0
votes

I am new to hadoop. I have mutiple folders containing files to processing a data in hadoop. I have doubt to implement mapper in map-reducer algorithm. Can I specify multiple mappers for processing mulitple files and have all input files as one output using a single reducer? If possible, please give guidelines for implementing the above steps.

2

2 Answers

1
votes

If you have multiple files, use MultipleInputs

addInputPath() method can be used to:

  1. add multiple paths and one common mapper implementation
  2. add multiple paths with custom mapper and input format implementation.

For having a single reducer, have each maps' output key same...say 1 or "abc". This way, the framework will create only one reducer.

1
votes

If the files are to be mapped in the same way (e.g. they all have the same format and processing requirements) then you can configure a single mapper to process all of them.

You do this by configuring the TextInputFormat class:

string folder1 = "file:///home/chrisgerken/blah/blah/folder1";
string folder2 = "file:///home/chrisgerken/blah/blah/folder2";
string folder3 = "file:///home/chrisgerken/blah/blah/folder3";
TextInputFormat.setInputPaths(job, new Path(folder1), new Path(folder2), new Path(folder3));

This will result in all of the files in folders 1, 2 and 3 being processed by the mapper.

Of course, if you need to use a different input type you'll have to configure that type appropriately.