Understanding number of map and reduce tasks in Hadoop MapReduce

Question

Assume that 8GB memory is available with a node in hadoop system.

If the task tracker and data nodes consume 2GB and if the memory required for each task is 200MB, how many map and reduce can get started?

8-2 = 6GB

So, 6144MB/200MB = 30.72

So, 30 total map and reduce tasks will be started.

Am I right or am I missing something?

vefthym vefthym · Accepted Answer · 2017-04-27T07:20:04

The number of mappers and reducers is not determined by the resources available. You have to set the number of reducers in your code by calling setNumReduceTasks().

For the number of mappers, it is more complicated, as they are set by Hadoop. By default, there is roughly one map task per input split. You can tweak that by changing the default block size, record reader, number of input files.

You should also set in the hadoop configuration files the maximum number of map tasks and reduce tasks that run concurrently, as well as the memory allocated to each task. Those last two configurations are the ones that are based on the available resources. Keep in mind that map and reduce tasks run on your CPU, so you are practically restricted by the number of available cores (one core cannot run two tasks at the same time).

This guide may help you with more details.

Understanding number of map and reduce tasks in Hadoop MapReduce

2 Answers