I am trying to model time series data with many sensors (> 50k) with cassandra. As I would like to do filtering on multiple sensors at the same time, I thought using the following (wide row) schema might be suitable:
CREATE TABLE data(
time timestamp,
session_id int,
sensor text,
value float,
PRIMARY KEY((time, session_id), sensor)
);
If every sensor value was a column in an RDBMS, my query would ideally look like:
SELECT * FROM data WHERE sensor_1 > 10 AND sensor_2 < 2;
Translated to my cassandra schema, I assumed the query might look like:
SELECT * FROM data
WHERE
sensor = 'sensor_1' AND
value > 10 AND
sensor = 'sensor_2' AND
value < 2;
I now have two problems:
- cassandra tells me that I can filter on the sensor column only once:
sensor cannot be restricted by more than one relation if it includes an Equal
- Obviously, the filter on value doesn't make sense at the moment. I wouldn't know how to express the relationship between sensor and value in the query in order to filter multiple columns in the same (wide) row.
I do know that a solution to the first question would be to use CQL's IN
clause. This however doesn't solve the second problem.
Is this scenario even suitable for cassandra?
Many thanks in advance.