I have a requirement, where I want to store the following:
- Mac Address // PKEY
- TimeStamp // PKEY
- LocationID
- ownerName
- Signal Strength
The insertion logic is as follows:
- Store the above statistics for each active device (MacAddress) once every hour at each location (LocationID)
- The entries are created at end of each hour, so the primary key will always be MAC+TimeStamp
There are no updates, only insertions
The queries which can be performed are as follows:
- Give me all the entries for last 'N' hours Where MacAddress = "...."
- Give me all the entries for last 'N' hours Where LocationID IN (locID1, locID2, ..);
Needless to say, there are billions of entries, and I want to use either HBASE or Cassandra. I've tried to explore, and it seems that Cassandra may not be correct choice.
The reasons for that is if I have the following in cassandra:
< < RowKey > MacAddress:TimeStamp > >
+ LocationID
+ OwnerName
+ Signal Strength
Both the queries will scan the whole database, right? Even if I add an index on LocationID, that is only going to help in the second query to some extent, because there is no index on timestamp (I believe that seaching on timestamp is not fast, as the MacAddress:TimeStamp composite Key would not allow us to search only on timestamp, and instead, a full scan would happen, is that correct?).
I'm stuck here big time, and any insights would really help, if we should opt HBase or Cassandra.