1
votes

I'm trying to reduce the size of an HBase table and have encountered this:

http://prafull-blog.blogspot.co.uk/2012/06/how-to-calculate-record-size-of-hbase.html

which says that the rowkey is stored for every column in the table. This seems incredibly wasteful and requires me to pay careful attention to every byte added to the rowkey since the number of bytes added to the table will be the total number of columns stored. Also, I expected every row to be stored as a document in the database under a single key, but this doesn't follow that. So, why is this implemented this way?

2

2 Answers

2
votes

No doubt, rowkey design is the most important decision when it comes to HBase schema. Have you tried enabling DATA_BLOCK_ENCODING => 'PREFIX' as a way to de-duplicate row key bytes on disk?

0
votes

HBase is a column-oriented database. So even though you have a row and column view, the way data is stored internally is different. Entities are designed to be present in a single row. However, the storage is done keeping column in mind. And column family is an extension to group the things, when region servers come into picture. As each column family is stored in a separate region server.

Also Hbase is indexed based on row key. Each column qualifier has information of their row keys and when queried, the results are faster ad each cell is independent.