I'm looking to create a table to store information containing details on the URL parameters of visitors to a bunch of websites. I'm getting a dictionary of parameters and values per user_id-session_id pair for each site, each with a timestamp.
In summary: Each site has Users (who have sessions) that have parameters that have values at times.
I'm going to want to query this data like so...
For each website, given a daterange, get the customer_id of the customers with a certain parameter value.
p.s. I don't readily have a list of user id/session id so I can't use those as row keys without doing a 'get_range()' query. Which may be expensive.
Attempted solutions/thoughts...
I figured that dynamic columns would be a good idea here as we don't really know all the different sets of parameters and I'd rather not make an attempt at hard coding them in. That gives me the name of the parameter as the column header and the value of the cells as the parameter value. That's all well and good but it leaves me with composite row keys of (site, date, customer_id, session_id).
I'm all for big composite row keys but I don't think I (would love to be corrected) could query the data to get ranges of dates in this way. If so, how can I do that please?
Also thought about having site-datetime composite column headers as that allows me to filter for those properties nicely. Then I could have the row keys as (param - param value) composite keys and build a set of user_ids as values. Actually, that's sounding like my best bet at the moment I think, but it still seems fairly messy.
If anybody has got any ideas, I'd love to hear them!
Thanks,
Matt
p.p.s I'm using Pycassa, CQL3 and Cassandra 1.2 so have all the tools available to me that those platforms offer