Set reducers to default but finally I have two files

Question

I'm running a map reduce job with number of reducers set to default (one reducer). In theory, the output must be one file per reducer, but when I run my job I have two files

part-r-00000

and

part-r-00001

Why is this happening ?

There's only one node in my cluster.

My Driver class :

public class DriverDate extends Configured implements Tool {

    @Override
    public int run(String[] args) throws Exception {
        if (args.length != 2) {
            System.out.printf("Usage: AvgWordLength inputDir outputDir\n");
            System.exit(-1);
        }
            Job job = new Job(getConf());
            job.setJobName("Job transformacio dates");

            job.setJarByClass(DriverDate.class);
            job.setMapperClass(MapDate.class);
            job.setReducerClass(ReduceDate.class);

            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(NullWritable.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(NullWritable.class);


            FileInputFormat.setInputPaths(job, new Path(args[0]));


            FileOutputFormat.setOutputPath(job, new Path(args[1]));

            job.waitForCompletion(true);

        return 0;
    }

    public static void main(String[] args) throws Exception{
        Configuration conf = new Configuration();
        ToolRunner.run(conf,new DriverDate(), args);
    }

}

can you post your main method (or Driver class), as well as the command that you execute to run the program? — vefthym
There's no other extra configuration and I'm sure that the jar i'm running is the correct one. — Arturo Dinaret
Then, I don't have an answer... just wait for someone else.. sorry and good luck! — vefthym
What is is the size of your data after the map (the intermediate data)?. if you set the reduce manually to 1, Do you have any retry in the reduce phase.? — Abdulrahman
Abdulrahman, I found the answer and you are right, setting the number of the reducers to one explicity is one way in order to solve the problem — Arturo Dinaret

vefthym vefthym · Accepted Answer · 2015-07-20T11:48:53

You are right that this code should produce one output file, since the default number of reduce tasks is 1 and each reducer generates one output file.

However, things that might have gone wrong include (but are not limited to):

Make sure that you run the correct jar and make sure that you update the correct jar, when generating it. Make sure that you copy the correct jar from the computer that generated it to the master of the (single-node) cluster. For example, in your instructions you say Usage: AvgWordLength inputDir outputDir, but the name of this jar is unlikely to be AvgWordLength...
Make sure that you do not specify a different number of reducers from command line (e.g., by using a -D property).

Other than that, I cannot find any other possible cause...

The number of nodes in the cluster is irrelevant.

Set reducers to default but finally I have two files

2 Answers