I am new in Hadoop Map-reduce. My input is many text files and I want to write the map-reduce program such that it will write all the files-names and the associated sentences with the file names in one output file where I want to just emit the file-name(key) and the associated sentences(value) from the mapper and the reducer will collect the key and all the values and write the file-name and their associated sentences in the output.
Mapper and reducer:
public void map(Text key, Text value,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
StringTokenizer itr = new StringTokenizer(value.toString(), ",");
String filename = new String();
FileSplit filesplit = (FileSplit) reporter.getInputSplit();
filename = filesplit.getpath().getName();
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(new Text(filename), word);
}
}
public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
// int sum = 0;
String translation = "";
while (values.hasNext()) {
translation += "|" + values.toString() + "|";
}
results.set(translation);
output.collect(key, results);
}
When I run the above mapper and reducer with the same configuration of inputformat (keyvaluetextinputformat.class) it does not write any thing in the output.
What should I change to achieve my goal?