0
votes

In Hadoop MapReduce the intermediate output (map output) is saved in the local disk. I would like to know if it is possible to start a job just with the reduce phase, that reads the mapoutput from the local disk, partition the data and execute the reduce tasks?

3

3 Answers

4
votes

There is a basic implementation of Mapper called IdentityMapper , which essentially passes all the key-value pairs to a Reducer.

  • Reducer reads the outputs generated by the different mappers as pairs and emits key value pairs.
  • The Reducer’s job is to process the data that comes from the mapper.
  • If MapReduce programmer do not set the Mapper Class using JobConf.setMapperClass then IdentityMapper.class is used as a default value.

You can't run just reducers without any mappers..

0
votes

Map reduce works on data which is in HDFS. So I dont think you can write reducer only map reduce to read from local disk

0
votes

If you use Hadoop Streaming, you can just add:

-mapper "/bin/sh -c \"cat\""