5
votes

I am on my way for becoming a cloudera Hadoop administrator. Since my start, I am hearing a lot about computing slots per machine in a Hadoop Cluster like defining number of Map Slots and Reduce slots.

I have searched internet for a log time for getting a Noob definition for a Map Reduce Slot but didn't find any.

I am really pissed off by going through PDF's explaining the configuration of Map Reduce.

Please explain what exactly it means when it comes to a computing slot in a Machine of a cluster.

4

4 Answers

4
votes

In map-reduce v.1 mapreduce.tasktracker.map.tasks.maximum and mapreduce.tasktracker.reduce.tasks.maximum are used to configure number of map slots and reduce slots accordingly in mapred-site.xml.

starting from map-reduce v.2 (YARN), containers is a more generic term is used instead of slots, containers represents the max number of tasks that can run in parallel under the node regardless being Map task, Reduce task or application master task (in YARN).

0
votes

generally it depends on CPU and memory
In out cluster, we set 20 map slot and 15 reduce slot for a machine with 32Core,64G memory
1.approximately one slot needs one cpu core
2.number of map slot should be a little more than reduce

0
votes

IN MRV1 each machine had fixed number of Slots dedicated for maps and reduce. In general each machine is configured with 4:1 ratio of maps:reducer on a machine .

  • logically one would be reading lot of data(Maps) and crunching them to small set(Reduce).

In MRV2 concept of containers came in and any container can run either a map/reducer/shell script .

0
votes

A bit late though, I'll answer anyways.

Computing Slot. Can you think of all the various computations in the Hadoop that would require some resource i.e. memory/CPUs/Disk Size.

Resource = Memory or CPU-Core or Disk Size required

Allocating resource to start a Container, allocating resource to perform a map or a reduce task etc.

It is all about how you would want to manage the resources you have in hand. Now what would that be? RAM, Cores, Disks Size.

Goal is to ensure your processing is not constrained by any one of these cluster resources. You want your processing to be as dynamic as possible.

As an example, Hadoop YARN allows you to configure min RAM required to start a YARN container, min RAM require to start a MAP/REDUCE task, JVM Heap Size (for Map and Reduce tasks) and the amount of virtual memory each task would get.

Unlike Hadoop MR1, you do not pre-configure (as an example RAM size) before you even begin executing Map-Reduce tasks. In the sense you would want your resource allocation to be as elastic as possible, i.e. dynamically increase RAM/CPU cores for either MAP or a REDUCE task.