0
votes

How does Hadoop decides the no. of reducers runs for particular problem? on what basis it decides like no.of partitioners or no. of cluster size or something? Explain below problem- I have 640MB input file, I have Block Size of 64MB. My Cluster size is 5 Node cluster. I have written my Input file into HDFS, it 10 data blocks. if i run my wordcount program for the written inputfile, So Tell me How many number of Mappers and how many Number of reducers will run.

3

3 Answers

1
votes

Number of maps is decided based on the choice of IputFormatClass. By default it is TextInputFormat class, which will creates same number of maps as the number of blocks. There will be exception if only the last record is broken across two blocks (in this case number of maps will be number of blocks minus one). The number reducers is a configuration choice, which can even be specified during job submission. By default number of reducers is one.

0
votes

Given that mappers and reducers numbers can be specified in conf files there is not an unique answer. But the default will be :
640Mb and 64Mb blocks = 10 mappers and 1 reducer .

For a more accurate answer, number of mappers is set according to
File total size / File block size But you can set configuration variable to change its behaviour like :
map minimum split size, map maximum split size, minimum map number, etc ... If you want to know more about these variables look at mapred default hdfs default and core default By the way there is plenty of question about map and reduce number on stack.

0
votes

Changing block size from 64 MB to 128MB, will reduce number of blocks from 10 to 5. You can specify only number of reducers in configuration, but no way to control number of reducers with parameter. Number of maps depends on number of input splits and input format. I tis recommended to have number of reducers less than your cluster size. In MapReduce 2 frame work, containers control the resources being used, so that you can size resources based on your data estimates and start as many reducers as needed based on data size and reduce function complexity.