0
votes

I know that the result of a Map-phase is an intermediate result which would be the input for a reduce-phase.

Recently, I read in hadoop definitive guide that "results of Map-tasks are stored in disk (i.e. not in HDFS, as they are an intermediate result) and only the results of Reduce-phase are stored in HDFS ".

So, with the above sentence my understanding is that if there is a Map-task then there should be a reduce task also. Because, as the result of a map-task is just an intermediate result and to store these result to HDFS then there should be a reduce-task. Is my understanding correct?

If my understanding is wrong then can anyone give me a scenario where there can be 1 map task and 0 reduce tasks?

3
It all depends on the application that you are building. You may decide to setNumReduceTasks(0) which enforces that there are no reducers in the program. In case you do not call this method, the identity reducer is triggered after the Map completes.Arun A K
@ArunAK: okay, that helped me. Thanks for your response.barath

3 Answers

0
votes

In Map Reduce, not all the time reducers phase required. In transformations, where input needs to be transformed reducer is not required.

In those scenarios, no of reducers will be defined as 0, or -reducer option will be set as None. In these cases mapper output will be stored in HDFS.

0
votes

Yes, when there is zero reducer the output of map task is not the intermediate but the final output. No shuffling, partitioning will take place in this case. Pure output from mapper is written to disk.

0
votes

For the benefit of the future readers: In the hadoop eco system(2.7.1 - Tez execution framework) i work there are extract jobs reading data out of Flatfile, Databases and CloudApps like salesforce into HDFS which do not perform any transformation to data have only Map tasks and no reduce tasks. And there is no enforcement of default reducers in the settings.