How to calculate number of rows in a region when HBase table is split across multiple regions

Question

How i can count the number of records in a region using hbase shell? If there is one region i can scan the table and get the number of records but if table is split across multiple regions, can i use a command on hbase shell to get this information? Thanks!

Lorand Bendig Lorand Bendig · Accepted Answer · 2013-09-20T14:30:53

You can list the rows in shell for a given key range (region) :

f_keyonly = org.apache.hadoop.hbase.filter.KeyOnlyFilter.new();
f_firstkey = org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter.new();
flist = org.apache.hadoop.hbase.filter.FilterList.new([f_keyonly, f_firstkey]);
scan 'mytable', {STARTROW => 'myStart', ENDROW => 'myEnd', FILTER =>  flist }

where myStart and myEnd are the startKey/endKey boundaries of a region. (check http://myhost:60030/rs-status)

If you just want to have the total number of rows then run the RowCounter job: E.g:

hadoop jar /path/to/hbase.jar rowcounter mytable --range=myStart,myEnd

The result will be stored in the RowCounterMapper counter.

On the other hand, if you need counting frequently, you may consider implementing a coprocessor which runs on the server side.

Further discussion can be found here .

How to calculate number of rows in a region when HBase table is split across multiple regions

1 Answers