I'm pretty new in Cassandra and I am trying to make a model for time series data. My current proposal is this:
CREATE TABLE myproject.variables (
nearest_10_minutes timestamp,
variable_type text,
value double,
variable_timestamp timestamp,
PRIMARY KEY((variable_type, nearest_10_minutes), variable_timestamp)
)
WITH CLUSTERING ORDER BY (variable_timestamp ASC);
The variable_timestamp is the actual time when the value is sensed. The nearest_10_minutes is the timestamp, but rounded to the nearest 10 minutes. For example if the variable_timestamp is: 19/11/2013 13:13:19.562, the nearest_10_minutes is 19/11/2013 13:10:00.000
I can get rid of the variable_type in the cluster key and put it into a secondary index, but I'm not sure does that aid my case.
The issue is that I'm not really sure how to properly order data. If I take a select * from myproject.variables (just for testing purposes), I get something like (timestamps showed only):
Tue Nov 19 13:19:52 CET 2013
Tue Nov 19 13:19:55 CET 2013
Tue Nov 19 13:40:04 CET 2013
Tue Nov 19 13:40:14 CET 2013
Tue Nov 19 13:40:29 CET 2013
...
Tue Nov 19 13:49:58 CET 2013
Tue Nov 19 13:49:59 CET 2013
...
Tue Nov 19 14:30:00 CET 2013
Tue Nov 19 14:30:01 CET 2013
Now, I'm not really clear should I get some default ordering or not? If I use a select * from myproject.variables order by variable_timestamp asc then I get an error stating I may only use ORDER BY if the partition key is filtered with EQ or IN. And IN can only be used with the second portion of the partition key, not the first one.
All in all, I'm a bit confused, how can I model this in a way I can select and order my data?
------------------------------Answer:------------------------------------
So in a way all the current answers by jorgebg and BryceAtNetwork23 and comments from Mikhail Stepura have shown me the path I consider right. Since I wanted to keep the partitioning as random as possible but in a way that I can predict it so I can have ordered queries and use the IN keyword (so I can put multiple partition keys in the query), I just decided to make a custom partitioning key. The scheme I chose is concatenating variable_type:timestamp_rounded_by_the_hour. That introduces some client leaking of the storage logic, I know, but it is fairly easy to recreate the set of partitioning keys on query in the code.
The answer I chose was the one that contributed the most.