I am new to hadoop. I have mutiple folders containing files to processing a data in hadoop. I have doubt to implement mapper in map-reducer algorithm. Can I specify multiple mappers for processing mulitple files and have all input files as one output using a single reducer? If possible, please give guidelines for implementing the above steps.
2 Answers
1
votes
If you have multiple files, use MultipleInputs
addInputPath() method can be used to:
- add multiple paths and one common mapper implementation
- add multiple paths with custom mapper and input format implementation.
For having a single reducer, have each maps' output key same...say 1 or "abc". This way, the framework will create only one reducer.
1
votes
If the files are to be mapped in the same way (e.g. they all have the same format and processing requirements) then you can configure a single mapper to process all of them.
You do this by configuring the TextInputFormat class:
string folder1 = "file:///home/chrisgerken/blah/blah/folder1";
string folder2 = "file:///home/chrisgerken/blah/blah/folder2";
string folder3 = "file:///home/chrisgerken/blah/blah/folder3";
TextInputFormat.setInputPaths(job, new Path(folder1), new Path(folder2), new Path(folder3));
This will result in all of the files in folders 1, 2 and 3 being processed by the mapper.
Of course, if you need to use a different input type you'll have to configure that type appropriately.