What is the reason that map task results being stored in local file system ?
Mapper
output is temporary output and is relevant only for Reducer
. Storing temporary output in HDFS (with replication factor) is overkill. Due to this reason, Hadoop framework stores output of Mapper into local file system instead of HDFS system. It saves lot of disk space.
One more important point from Apache tutorial page :
All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output.
The Mapper outputs are sorted and then partitioned per Reducer
In the case of map reduce job where there is no reduce phase(only map phase exist) where is the final result stored ?
You can more details about this query from Apache tutorial page.
Reducer NONE
It is legal to set the number of reduce-tasks to zero if no reduction is desired.
In this case the outputs of the map-tasks go directly to the FileSystem, into the output path set by FileOutputFormat.setOutputPath(Job, Path). The framework does not sort the map-outputs before writing them out to the FileSystem.
If number of Reducers are greater than 0, mapper outputs are stored in local file system and sorted before sending them to Reducer. If number of Reducers are 0, then mapper outputs are stored in HDFS without sorting.