When I set the number of reducers to zero, the map phase finishes quite fast (~10 mins). However, when I set the number of reducers to more than 1, the time that the map phase needs (exactly the same mapper code), increases dramatically (I stop it after ~30mins, while it still is at 20%). The first map tasks in the queue reach 100% and then the process stucks.
Any intuition? Is it the case that when no reducer is used map output goes straight to disk, while when a reduce phase is used the map output goes to a memory buffer?
A pseudocode of my main mapper loop is the following:
for (VIntWritable e1 : D2entities) {
for (VIntWritable e1 : D1entities) {
output.collect(e1, e2);
}
}
In both cases I use conf.setCompressMapOutput(true) and conf.set("mapred.reduce.slowstart.completed.maps", "1.00");. When I use a reducer, I also set:
conf.setOutputKeyClass(VIntWritable.class);
conf.setOutputValueClass(NullWritable.class);
conf.setMapOutputKeyClass(VIntWritable.class);
conf.setMapOutputValueClass(VIntWritable.class);
otherwise, I use:
conf.setOutputKeyClass(VIntWritable.class);
conf.setOutputValueClass(VIntWritable.class);

