I have written a map-reduce job for the data in HBase. It contains multiple mappers and just a single reducer. The Reducer method takes in the data supplied from the mapper and do some analytic on it. After the processing is complete for all the data in HBase I wanted to write the data back to a file in HDFS through the single Reducer. Presently I am able to write the data to HDFS every time I get new one but unable to figure how to write the final conclusion to HDFS only at last.
1
votes
Do you want to export the HBase Table data to a HDFS file?
– SSaikia_JtheRocker
I am using Map-Reduce job to read data in parallel. But I use single reducer to do the analytics and at last write it back to HDFS. Which may contain some conclusions about the HBASE data not the tuples.
– user1580096
Check I've posted an answer if it helps.
– SSaikia_JtheRocker
I don't need Hbase data on the HDFS. I want to do the analytic of the data in the Reducer method and the conclusion of all results need to written on the HDFS only after the last processing.
– user1580096
So, when you do a job.setNumReduceTasks(1); doensn't it do the trick for you? That will force a single reducer.
– SSaikia_JtheRocker
1 Answers
2
votes
So, if you trying to write a final result from a single reducer to HDFS, you can try any one of the approaches below -
- Use Hadoop API FileSystem's create() function to write to HDFS from the reducer.
- Emit a single key and value from reducer after the final calculation
- Override Reducers cleanup() function and do point (1) there.
Details on 3:
Hope this helps.