Hadoop cluster - how to know the ideal maximum number of map/reduce tasks for each tasktracker

Question

I've just set up a Hadoop cluster with Hadoop 0.20.205. I have a master (NameNode and JobTracker) and two other boxes (slaves).

I'm trying to understand, how to define the number of map and reduce tasks to use.

So far I understood that I can set the maximum number of map and reduce tasks that each TaskTracker is able to handle simultaneously with: *mapred.tasktracker.map.tasks.maximum* and *mapred.tasktracker.reduce.tasks.maximum*.

Also, I can define the maximum number of map tasks the whole cluster can run simultaneously with *mapred.map.tasks*. Is that right?

If so, how can I know what should be the value for *mapred.tasktracker.map.tasks.maximum*? I see that the default is 2. But why? What are the pros and cons of increasing or decreasing this value?

vefthym vefthym · Accepted Answer · 2013-12-03T08:36:10

I don't think that there is a rule for that (like the rule for setting the number of reducers).

What I do is, set the number of mappers and reducers to the number of cores available minus 1 for each machine. Intuitively, this will leave each machine some memory for the other processes (like cluster communication). But I may be wrong. Anyway, this is the only thing I found from "Pro Hadoop". It suggests using as many mappers as the number of available cores and one or two reducers. I hope it helps.

Hadoop cluster - how to know the ideal maximum number of map/reduce tasks for each tasktracker

2 Answers