1
votes

I'm a beginner in HBase. I need to design my table. I want to play with the following information:

At the date XX-XX-XXXX, the word 'HELLO' is in document 2,3,4 and the weight of each doc is 12,45,36 - My raw data: doc:D title:'i like potatoes',weight:W,date:D

I created a table with, row: word, column:date, value:doc But I can't store multiple row with the same date.

Can we create multiple column families for a table? What can be the best way to design the schema?

Thanks a lot

1
I found the solution, Hbase value can be a serialize ArrayList<Integer> which can contain documents Id.JohnJohnGa
If you don't need this question anymore then close it.Amir Raminfar

1 Answers

0
votes

Is date the most relevant bit of information for a document? as you say, you can only store one document per date with your given schema. An alternative would be to make a compound key, like: DATE_TIME_DOCUMENT-ID. Document id's could be a sha1 of the contents to ensure uniqueness. And, if you want recent documents to be easily retrievable, you could also invert the DATE-TIME measure (e.g. Long.MAX_VALUE - document timestamp). If you don't care about date, then documents can be stored on their id alone.