Hadoop options are not having any effect (mapreduce.input.lineinputformat.linespermap, mapred.max.map.failures.percent)

Question

I am trying to implement a MapReduce job, where each of the mappers would take 150 lines of the text file, and all the mappers would run simmultaniously; also, it should not fail, no matter how many map tasks fail.

Here's the configuration part:

        JobConf conf = new JobConf(Main.class);
        conf.setJobName("My mapreduce");

        conf.set("mapreduce.input.lineinputformat.linespermap", "150");
        conf.set("mapred.max.map.failures.percent","100");

        conf.setInputFormat(NLineInputFormat.class);

        FileInputFormat.addInputPath(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

The problem is that hadoop creates a mapper for every single line of text, they seem to run sequentially, and if a single one fails, the job fails.

From this I deduce, that the settings I've applied do not have any effect.

What did I do wrong?

Praveen Sripati Praveen Sripati · Accepted Answer · 2011-09-18T03:41:45

I assume you are using Hadoop 0.20. In 0.20 the configuration parameter is "mapred.line.input.format.linespermap" and you are using "mapreduce.input.lineinputformat.linespermap". If the configuration parameter is not set then it's defaulted to 1, so you so you are seeing the behavior mentioned in the query.

Here is the code snippet from 0.20 NLineInputFormat.

public void configure(JobConf conf) { N = conf.getInt("mapred.line.input.format.linespermap", 1); }

Hadoop configuration is sometimes a real pain, not documented properly and I have observed that the configuration parameter also keeps changing sometimes between releases. The best bet is to see the code when uncertain of some configuration parameters.

Hadoop options are not having any effect (mapreduce.input.lineinputformat.linespermap, mapred.max.map.failures.percent)

4 Answers