I'm quite new to hbase, and imagine we want to aggregate unique document counts per day for each category.
First idea was somewhat like below
table name: yyyyMMdd row key : category_docid column family : whatever seems to be used afterwards,
In such case, I think I can scan with rowkey start prefix and end prefix, then count the keys of them.
But there are several problems 1. scan seems to be heavy for count operation since I have to scan through all the Result array and increment by myself. 2. categories are continuously changing, would be much better if it's possible to do something like 'group by' in SQL but I haven't found how yet.
What do you think of this approach or is there any other better idea?