0
votes

After Hadoop is started, two types o daemon processes are running. One is the daemon process called namenode on the namenode, the other is he daemon process called datanode on he datanode. I am sure that they are used when a big file from local file system is loaded to HDFS by means of "hdfs dfs" command.

But is it also used when a Hadoop MapReduce job is running? My understanding is no, but maybe it is also used during the Shuffle, when the outpu of map functions might be transfered from one datanode to another datanode.

1

1 Answers

0
votes

Yes. Name Node and Data Node are running all the time.

When a MapReduce job is started, depending on the job, 'n' number of mapper and reduce tasks could be spawned (determined by number of splits).

Each Mapper reads a part of the input (split). Hence, while reading the input from HDFS, Name Node/Data Node are involved.

During the Shuffle and Sort phase, the Reducers read the data directly from different Mappers. But, when the processing is completed, the Reduce tasks have to write the output to HDFS. Again, Name Node/Data Node are involved in writing the data to HDFS.