Hadoop map-reduce programming

Question

I am new in Hadoop Map-reduce. My input is many text files and I want to write the map-reduce program such that it will write all the files-names and the associated sentences with the file names in one output file where I want to just emit the file-name(key) and the associated sentences(value) from the mapper and the reducer will collect the key and all the values and write the file-name and their associated sentences in the output.

Mapper and reducer:

public void map(Text key, Text value,
                OutputCollector<Text, Text> output,
                Reporter reporter) throws IOException {
    StringTokenizer itr = new StringTokenizer(value.toString(), ",");
    String filename = new String();
    FileSplit filesplit = (FileSplit) reporter.getInputSplit();
    filename = filesplit.getpath().getName();
    while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        output.collect(new Text(filename), word);
    }
}

public void reduce(Text key, Iterator<Text> values,
                   OutputCollector<Text, Text> output,
                   Reporter reporter) throws IOException {
    // int sum = 0;
    String translation = "";
    while (values.hasNext()) {
        translation += "|" + values.toString() + "|";
    }

    results.set(translation);
    output.collect(key, results);
}

When I run the above mapper and reducer with the same configuration of inputformat (keyvaluetextinputformat.class) it does not write any thing in the output.

What should I change to achieve my goal?

Chris Gerken Chris Gerken · Accepted Answer · 2014-03-14T14:09:11

In your reduce method you declare values to be an Iterator. It should be declared as an Iterable instead.

public void reduce(Text key, Iterable<Text> values, ....

instead of

public void reduce(Text key, Iterator<Text> values, ....

Once you've done that, you can do:

Iterator<Text> iter = values.iterator();
while(iter.hasNext())
{
    translation += "|" + iter.next().toString() + "|";
}

Because you used the wrong type the method isn't overriding the default reduce method which doesn't do anything. That's why you get no output.

I also don't see where you declare the variable results, either.

Hadoop map-reduce programming

1 Answers