1
votes

I'm new to Cassandra and I'm trying to define a data model that fits my requirements.

I have a sensor that collects one value every millisecond and I have to store those data in Cassandra. The queries that I want to perform are:

1) Give me all the sensor values from - to these timestamp values

2) Tell me when this range of values was recorded

I'm not sure if there exist a common schema that can satisfy both queries because I want to perform range queries on both values. For the first query I should use something like:

CREATE TABLE foo (
value text,
timestamp timestamp,
PRIMARY KEY (value, timestamp));

but then for the second query I need the opposite since I can't do range queries on the partition key without using a token that restricts the timestamp:

CREATE TABLE foo (
value text,
timestamp timestamp,
PRIMARY KEY (timestamp, value));

So do I need two tables for this? Or there exist another way? Thanks

PS: I need to be as fast as possible while reading

1

1 Answers

2
votes

I have a sensor that collects one value every millisecond and I have to store those data in Cassandra.

The main problem I see here, is that you're going to run into Cassandra's limit of 2 billion col values per partition fairly quickly. DataStax's Patrick McFadin has a good example for weather station data (Getting Started with Time Series Data Modeling) that seems to fit here. If I apply it to your model, it looks something like this:

CREATE TABLE fooByTime (
    sensor_id text,
    day text,
    timestamp timestamp,
    value text,
PRIMARY KEY ((sensor_id,day),timestamp)
);

This will partition on both sensor_id and day, while sorting rows within the partition by timestamp. So you could query like:

> SELECT * FROM fooByTime WHERE sensor_id='5' AND day='20151002' 
  AND timestamp > '2015-10-02 00:00:00' AND timestamp < '2015-10-02 19:00:00';

 sensor_id | day      | timestamp                | value
-----------+----------+--------------------------+-------
         5 | 20151002 | 2015-10-02 13:39:22-0500 |    24
         5 | 20151002 | 2015-10-02 13:49:22-0500 |    23

And yes, the way to model in Cassandra, is to have one table for each query pattern. So your second table where you want to range query on value might look something like this:

CREATE TABLE fooByValues (
    sensor_id text,
    day text,
    timestamp timestamp,
    value text,
PRIMARY KEY ((sensor_id,day),value)
);

And that would support queries like:

> SELECT * FROm foobyvalues WHERE sensor_id='5' 
  AND day='20151002' AND value > '20' AND value < '25';

 sensor_id | day      | value | timestamp
-----------+----------+-------+--------------------------
         5 | 20151002 |    22 | 2015-10-02 14:49:22-0500
         5 | 20151002 |    23 | 2015-10-02 13:49:22-0500
         5 | 20151002 |    24 | 2015-10-02 13:39:22-0500