I've just set up a Hadoop cluster with Hadoop 0.20.205. I have a master (NameNode and JobTracker) and two other boxes (slaves).
I'm trying to understand, how to define the number of map and reduce tasks to use.
So far I understood that I can set the maximum number of map and reduce tasks that each TaskTracker is able to handle simultaneously with: *mapred.tasktracker.map.tasks.maximum*
and *mapred.tasktracker.reduce.tasks.maximum*
.
Also, I can define the maximum number of map tasks the whole cluster can run simultaneously with *mapred.map.tasks*
. Is that right?
If so, how can I know what should be the value for *mapred.tasktracker.map.tasks.maximum*
? I see that the default is 2. But why? What are the pros and cons of increasing or decreasing this value?