2
votes

I am using hadoop 1.0.3 to run map reduce jobs. I have a 3 node cluster setup. The problem is that I have set the property mapred.map.tasks to 20 in my /conf/mapred-site.xml, but hadoop is only showing 6 map tasks when I run the job and access the cluster information using web page at :50030. I have edited the above mentioned configuration file on all the nodes in the cluster. Please help.

Regards, Mohsin

3
How big is in the input data? If the input data is split into n splits, then Hadoop will only n map tasks and not more.Praveen Sripati
number of input splits is 764.sp3tsnaz
@PraveenSripati I want to set number of parallel map tasks. I can see in my web console that it has 764 map tasks. But running map tasks are 6 only.sp3tsnaz

3 Answers

4
votes

As mentioned by miguno, Hadoop only considers the value of mapred.map.tasks as a hint.

That being said, when I was messing around with MapReduce I was able to increase the map count by specifying a max count. This might not work for you, but you might give it a shot.

<property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>60</value>
</property>

NOTE: This value represents the TOTAL amount of maps. So if you want each of your (3) nodes to run 20 maps you have to specify mapred.map.tasks, like so:

<property>
    <name>mapred.map.tasks</name>
    <value>20</value>
</property>
3
votes

This question seems to be a duplicate of Setting the number of map tasks and reduce tasks.

Hadoop does not honor mapred.map.tasks beyond considering it a hint.

See this information on the Hadoop wiki:

Actually controlling the number of maps is subtle. The mapred.map.tasks parameter is just a hint to the InputFormat for the number of maps. The default InputFormat behavior is to split the total number of bytes into the right number of fragments. However, in the default case the DFS block size of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapred.min.split.size. Thus, if you expect 10TB of input data and have 128MB DFS blocks, you'll end up with 82k maps, unless your mapred.map.tasks is even larger. Ultimately the InputFormat determines the number of maps.

That said Hadoop does accept the user specified mapred.reduce.tasks and doesn't manipulate that.

In summary you cannot force mapred.map.tasks for a given MapReduce job but you can force mapred.reduce.tasks.

Edit: Going slightly beyond your direct question there is a way to indirectly force Hadoop to use more mappers. This involves setting the combination of mapred.min.split.size, dfs.block.size and mapred.max.split.size appropriately. Note that the actual sizes of the input files also play a role here. See this answer for details, which basically quotes from Tom White's Hadoop: The Definite Guide book.

0
votes

Its primarily the input format that determines the number of map tasks. http://wiki.apache.org/hadoop/HowManyMapsAndReduces

To your question, by default, the task tracker runs two map | reduce tasks concurrently.
To change that , set the property mapred.map.tasks.maximum in /conf/mapred-site.xml

. Advised to go with the formula, (CPUS > 2) ? (CPUS * 0.75) : 1 while setting this.