Cassandra select query for timeseries data

Question

I have a table like:

CREATE TABLE mydata (
my_key TEXT,
log_time TIMESTAMP,
value1 DOUBLE,
PRIMARY KEY (my_key, log_time)
);

We get timeseries data where we insert 100K values for 100K keys (1 value for each key each minute) so new values are added every minute.

I would like to run the following range query:

Select * from mydata where my_key='1' and (log_time >='starttime' and log_time <='endtime') and (value1 > 100 and value1 < 200)

With current table design, it is possible as value1 is not part of the partition/clustering key. However we can add indexing on value1 to make it work or make it part of key.

My questions are:

Is it a good idea to add value1 as part of key?
Is it a good idea to add indexing on value1 as the insertions are too frequent (1 minute)?
If both are not good options, how can we model this requirement for range queries?

Thomas Arnaud Thomas Arnaud · Accepted Answer · 2016-10-21T13:17:20

Is it a good idea to add value1 as part of key?

If you do that, then you will have to know the possible values of value1 to request data. If value1 was a kind of enumeration it could be possible (with in cql keyword) but as a double I think it is not a good solution.

Is it a good idea to add indexing on value1 as the insertions are too frequent (1 minute)?

I don't know any reason why you shouldn't.

Cassandra select query for timeseries data

1 Answers