Can there be a scenario in hadoop where there'll be only 1 map task and 0 reduce tasks?

Question

I know that the result of a Map-phase is an intermediate result which would be the input for a reduce-phase.

Recently, I read in hadoop definitive guide that "results of Map-tasks are stored in disk (i.e. not in HDFS, as they are an intermediate result) and only the results of Reduce-phase are stored in HDFS ".

So, with the above sentence my understanding is that if there is a Map-task then there should be a reduce task also. Because, as the result of a map-task is just an intermediate result and to store these result to HDFS then there should be a reduce-task. Is my understanding correct?

If my understanding is wrong then can anyone give me a scenario where there can be 1 map task and 0 reduce tasks?

It all depends on the application that you are building. You may decide to setNumReduceTasks(0) which enforces that there are no reducers in the program. In case you do not call this method, the identity reducer is triggered after the Map completes. — Arun A K

Ranga Vure Ranga Vure · Accepted Answer · 2014-08-17T01:01:40

In Map Reduce, not all the time reducers phase required. In transformations, where input needs to be transformed reducer is not required.

In those scenarios, no of reducers will be defined as 0, or -reducer option will be set as None. In these cases mapper output will be stored in HDFS.

Can there be a scenario in hadoop where there'll be only 1 map task and 0 reduce tasks?

3 Answers