3
votes

I have gone thru few hadoop info books and papers.

A Slot is a map/reduce computation unit at a node. it may be map or reduce slot. As far as, i know split is a group of blocks of files in HDFS which have some length and location of nodes where they ares stored. Mapper is class but when the code is instantiated it is called map task. Am i right ? I am not clear of difference and relationship between map tasks, data splits and Mapper.

Regarding scheduling i understand that when a map slot of a node is free a map task is choosen from the non-running map task and launched if the data to be processed by the map task is the node. Can anyone explain it clearly in terms of above concepts: slots, mapper and map task etc.

Thanks, Arun

4

4 Answers

4
votes

As far as, I know split is a group of blocks of files in HDFS which have the same length and location of nodes where they are stored.

InputSplit is a unit of data which a particular mapper will process. It needs not be just a group of HDFS blocks. It can be a single line, 100 rows from a DB, a 50MB file etc.

I am not clear about difference and relationship between map tasks, data splits and Mapper.

An InputSplit is processed by a map task and an instance of Mapper is a Map task.

0
votes

As I understand:
first data split in HDFS to the Data nodes
then when there are a new job , the job tracker divide this job into Map and reduce tasks and then Job tracker assign each map task to the node which already has the split of data related to this map task so the data is local in the node and there will be no cost for moving data so the execution time be less as possible
but sometimes we have to assign task to node which has not the data on it , so the node has to get the data through network and then processed it

0
votes

input split is not the data it is the reference to particular amount of data that map reduce process. Usually it is same as the block size, because if size of both is not same and some data is on different node then we need to transfer that data.

0
votes

MAPPER : mapper is a class. MAPPER PHASE : mapper phase is a input,output code in to convert the values in keys and values pairs(keys,values). MAPPER SLOT : to execute the mapper and reducer code.