How to force hadoop to run more than 1 Reduce job

Question

I have a 5 node Hadoop cluster in which 2 nodes are dedicated to be data nodes and also running tasktracker.

I run my hadoop job like

sudo -u hdfs hadoop jar /tmp/MyHadoopJob2.jar com.abhi.MyHadoopJob2 -D mapred.reduce.tasks=2 /sample/cite75_99.txt /output3

the job runs successfully and I can see the correct output ... but now when I go to the portal

http://jt1.abhi.com:50030

I can see

enter image description here

So only 1 reduce job is being run.

The reason I am so particular about running multiple reduce jobs is that I want to confirm whether hadoop will still create a perfectly sorted output file even when different instances of reduce jobs were running on different machine?

currently my output file is fully sorted but this is because there is only 1 reducer job being run.

Ah. So doesn't this create a devil and deep sea problem. if we have 1 reducer, it will crash if the input is too large. but if we have multiple reducers, then we don't get a single output? — Knows Not Much
Right. Multiple outputs because two machines cant write to the same file co currently. You can still achieve a single sorted file after concatenating the files. — Mike Park

Mohammed Niaz Mohammed Niaz · Accepted Answer · 2014-09-02T10:04:25

The number of output files would be based on the number of reducers for your given job. But still you can merge the multiple files to one file if your requirement demands.

To merge use below hadoop shell command

command> hadoop fs -getmerge <src> <localdst>
src: hdfs output folder path
localdst: local system path with filename(one file)

Hope this may clarify your doubts.

How to force hadoop to run more than 1 Reduce job

2 Answers