1
votes

Assume that 8GB memory is available with a node in hadoop system.

If the task tracker and data nodes consume 2GB and if the memory required for each task is 200MB, how many map and reduce can get started?

8-2 = 6GB

So, 6144MB/200MB = 30.72

So, 30 total map and reduce tasks will be started.

Am I right or am I missing something?

2

2 Answers

1
votes

The number of mappers and reducers is not determined by the resources available. You have to set the number of reducers in your code by calling setNumReduceTasks().

For the number of mappers, it is more complicated, as they are set by Hadoop. By default, there is roughly one map task per input split. You can tweak that by changing the default block size, record reader, number of input files.

You should also set in the hadoop configuration files the maximum number of map tasks and reduce tasks that run concurrently, as well as the memory allocated to each task. Those last two configurations are the ones that are based on the available resources. Keep in mind that map and reduce tasks run on your CPU, so you are practically restricted by the number of available cores (one core cannot run two tasks at the same time).

This guide may help you with more details.

0
votes

The number of concurrent task is not decided just based on the memory available on a node. it depends on the number of cores as well. If your node has 8 vcores and each of your task takes 1 core then at a time only 8 task can run.