I am looking for some help on HBase (fairly new to it and trying to understand if I cna use it for my POC).
Use case: I need a historical price data table which for e.g. will store data for say 10 different indices. One of the requirement would be to trace or audit trail the changes made to any attribute of a constituents or shares or instrument. Also if I want to find the list of instruments which has variance of price change n% in the month of say Jan 2010.
Data e.g. (some possibilities) (columns mentioned below are just to illustrate)
date instrument high low mid user ts
20130101 goog 34 33.4 33.8 system 10:30
20130101 yhoo 24 23.4 23.8 system 10:50
20130101 goog 34.1 33.3 33.8 ops 10:55
20130101 msft 134 133.4 133.8 system 11:00
20130101 msft 134 133.9 133.8 ops 11:30
20130101 goog 34.1 33.3 34.1 ops 11:30
20130101 aapl 48 48.4 47.9 system 11:30
Similar data will be availabe for subsequent dates. Kindly note that in a day a instrument's attribute/attributes value could change by any user (as seen for goog, msft) and for some no change at all (aapl, yhoo).
What would be the best data model which I can use to store this data and from which retrieval would also be easy?
If HBase has composite rowkey (please help me with syntax in case it is) then I can have something like,
ROW COLUMN+CELL
goog-20130101 column=cf1:h1, timestamp=1389020633920, value=34
goog-20130101 column=cf1:h2, timestamp=1389020654614, value=34.1
goog-20130101 column=cf1:h3, timestamp=1389020668338, value=34.1
goog-20130101 column=cf1:l1, timestamp=1389020633920, value=33.4
goog-20130101 column=cf1:l2, timestamp=1389020654614, value=33.8
goog-20130101 column=cf1:l3, timestamp=1389020668338, value=33.3
goog-20130101 column=cf1:u1, timestamp=1389020633920, value=system
goog-20130101 column=cf1:u2, timestamp=1389020654614, value=ops
goog-20130101 column=cf1:u3, timestamp=1389020668338, value=ops
aapl-20130101 column=cf1:h1, timestamp=1389020633920, value=48
aapl-20130101 column=cf1:l1, timestamp=1389020633920, value=48.4
aapl-20130101 column=cf1:u1, timestamp=1389020633920, value=system
1) Can we create such rowkeys? How? 2) If the data for a rowkey already exists (goog-20130101) for e.g. then how can we inform/put the data to the same rowkey BUT column name is changed to h1, l1, u1 in our case? subsequently to h2, l2 etc. Is this acheivable? 3) How to retrieve the latest data and its values (say hi for goog on a date)?
Or if someone has come across such data (where you track multiple events/activity of user/object anything for a day and store), can advice on a better model for this which suits HBase.
Thanks in advance for your help.