1
votes

I am trying to model time series data with many sensors (> 50k) with cassandra. As I would like to do filtering on multiple sensors at the same time, I thought using the following (wide row) schema might be suitable:

CREATE TABLE data(
 time timestamp, 
 session_id int, 
 sensor text, 
 value float, 
 PRIMARY KEY((time, session_id), sensor)
);

If every sensor value was a column in an RDBMS, my query would ideally look like:

SELECT * FROM data WHERE sensor_1 > 10 AND sensor_2 < 2;

Translated to my cassandra schema, I assumed the query might look like:

SELECT * FROM data 
WHERE 
  sensor = 'sensor_1' AND 
  value > 10 AND 
  sensor = 'sensor_2' AND 
  value < 2;

I now have two problems:

  1. cassandra tells me that I can filter on the sensor column only once:

sensor cannot be restricted by more than one relation if it includes an Equal

  1. Obviously, the filter on value doesn't make sense at the moment. I wouldn't know how to express the relationship between sensor and value in the query in order to filter multiple columns in the same (wide) row.

I do know that a solution to the first question would be to use CQL's IN clause. This however doesn't solve the second problem.

Is this scenario even suitable for cassandra?

Many thanks in advance.

1

1 Answers

1
votes

You could try to use IN clause here. So your query would be like this:

SELECT * FROM data 
WHERE time = <time> and session_id = <session id> 
    AND sensor IN ('sensor_1', 'sensor_2') 
    AND value > 10 AND value < 2