What is best approach creating multiple hbase tables or multiple column families in single hbase table

Question

My hbase row key is different and also I need to aggregate the data and store seperatly. In this use case which one is best approach

What is best approach creating multiple hbase tables or multiple column families in single hbase table

I am Refining my question

Below is my usecase.

I am processing weblogs which has retailer, Category, Product clicks.

I am storing above weblog into one hbase table (Log) with separate rowkey and same column family Ex.
- A.
for Retailer -- IP | DateTime | Sid | Retailer
- B.
for Category -- IP | DateTime | Sid | Retailer | Category
- C.
for Product -- IP | DateTime | Sid | Retailer | Category |Product
From above table I am calculating Day clicks and storing into other hbase tables like ( Retailer_Day_cnt, Category_Day_Cnt, Product_Day_Cnt)

Here my question is what is the best way to store the data into hbase with above 1 and 2 cases, is it separate hbase tables or column family.

Note: In case1 I am doing only writes, but in case2 I will do multiple reads and writes.

Thanks in advance Surendra

spats spats · Accepted Answer · 2015-07-14T14:30:24

From performance perspective, lesser the column families better it is. As all the column families in table are flushed at same time even if some of the column families have very little data, making flush less efficient. . If your table is heavy on write this will result lot hfiles -> increased in compactions -> increased GC pauses, this can make whole hbase very slow so better don't use multiple column family if you don't really need them or all column families will have same amount data.

Find more details here: Hbase Book

What is best approach creating multiple hbase tables or multiple column families in single hbase table

3 Answers