1
votes

I have written a map-reduce job for the data in HBase. It contains multiple mappers and just a single reducer. The Reducer method takes in the data supplied from the mapper and do some analytic on it. After the processing is complete for all the data in HBase I wanted to write the data back to a file in HDFS through the single Reducer. Presently I am able to write the data to HDFS every time I get new one but unable to figure how to write the final conclusion to HDFS only at last.

1
Do you want to export the HBase Table data to a HDFS file? - SSaikia_JtheRocker
I am using Map-Reduce job to read data in parallel. But I use single reducer to do the analytics and at last write it back to HDFS. Which may contain some conclusions about the HBASE data not the tuples. - user1580096
Check I've posted an answer if it helps. - SSaikia_JtheRocker
I don't need Hbase data on the HDFS. I want to do the analytic of the data in the Reducer method and the conclusion of all results need to written on the HDFS only after the last processing. - user1580096
So, when you do a job.setNumReduceTasks(1); doensn't it do the trick for you? That will force a single reducer. - SSaikia_JtheRocker

1 Answers

2
votes

So, if you trying to write a final result from a single reducer to HDFS, you can try any one of the approaches below -

  1. Use Hadoop API FileSystem's create() function to write to HDFS from the reducer.
  2. Emit a single key and value from reducer after the final calculation
  3. Override Reducers cleanup() function and do point (1) there.

Details on 3:

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Reducer.html#cleanup-org.apache.hadoop.mapreduce.Reducer.Context-

Hope this helps.