Hbase mapreduce interaction

Question

I have an program hbase and mapreduce.

I store data in HDFS, size of this file is : 100G. Now i put this data to Hbase.

I use mapreduce to scan this file lost 5 minutes. But to scan hbase table lost 30 minutes.

How to increase the speed when using hbase and mapreduce ?

Thanks.

shazin shazin · Accepted Answer · 2012-11-09T08:17:04

I am assuming you are having a Single Node HDFS. If you had your 100Gb file in a Multi Node cluster of HDFS, it would have been much faster for both Map Reduce and Hive.

You could try increasing no of mappers and reducers on Map Reduce to gain some performance increase, have a look at this post.

Hive is essentially a Data Warehousing tool built on top of HDFS and every query is underneath is a Map Reduce task itself. So above post would answer this problem also.

Hbase mapreduce interaction

1 Answers