If I create MR job by using TableMapReduceUtil(Hbase), it seems that hbase scanner feeds data into mapper and converts data from reducer to specific hbase output format to store it in hbase table. For this reason, I expect hbase mapreduce job will take more time than native MR job. So, How definitely long does Hbase job take more than native MR?
2
votes
1 Answers
2
votes
In regards to reads going through HBase can be 2-3 times slower than native map/reduce that uses files directly.
In the recently announced HBase 0.98 they've added the capability to do map/reduce over HBase snapshots. You can see this presentation for details (slide 7 for API, slide 16 for speed comparison).
In regard to writes you can write into HFiles directly and then bulk load to HBase - however since HBase caches data and does bulk writes you can also tune it up and get comparable or better results