
I have written a map-reduce job for the data in HBase. It contains multiple mappers and just a single reducer. The Reducer method takes in the data supplied from the mapper and do some analytic on it. After the processing is complete for all the data in HBase I wanted to write the data back to a file in HDFS through the single Reducer. Presently I am able to write the data to HDFS every time I get new one but unable to figure how to write the final conclusion to HDFS only at last.

Do you want to export the HBase Table data to a HDFS file?SSaikia_JtheRocker
I am using Map-Reduce job to read data in parallel. But I use single reducer to do the analytics and at last write it back to HDFS. Which may contain some conclusions about the HBASE data not the tuples.user1580096
Check I've posted an answer if it helps.SSaikia_JtheRocker
I don't need Hbase data on the HDFS. I want to do the analytic of the data in the Reducer method and the conclusion of all results need to written on the HDFS only after the last processing.user1580096
So, when you do a job.setNumReduceTasks(1); doensn't it do the trick for you? That will force a single reducer.SSaikia_JtheRocker

1 Answers


So, if you trying to write a final result from a single reducer to HDFS, you can try any one of the approaches below -

  1. Use Hadoop API FileSystem's create() function to write to HDFS from the reducer.
  2. Emit a single key and value from reducer after the final calculation
  3. Override Reducers cleanup() function and do point (1) there.

Details on 3:


Hope this helps.