Modelling Cassandra Many-to-Many Relation for Sensor Data

Question

I am building an application with Cassandra as the data store which captures data from a large number of sensors and allows different monitoring components to monitor those sensors.

For example, a server room might have a temperature sensor and 10 different server monitoring components might receive values from that one sensor. Likewise, a monitoring component will receive data from multiple sensors.

My (very simplified) conceptual schema looks something like:

I need to run the following queries:

Historical values for an individual sensor
The latest value of every attribute on a monitoring component

and it's the second one that I'm having a problem with.

When a Measurement arrives I only know the sensor ID, timestamp and value. How can I model a table that allows me to keep the current value for every attribute on a Monitor?

I tried the following table:

CREATE TABLE monitor_subscriptions (
    sensor_id uuid,
    monitor_id uuid,
    attribute text, # e.g. 'Temperature'
    timestamp timestamp,
    value double,
    PRIMARY KEY (sensor_id, monitor_id, attribute)
);

What I'm attempting to do is update the timestamp/value of every monitor that is subscribed to that sensor but then obviously the following query doesn't work because I'm not specifying monitor_id or attribute:

UPDATE monitor_subscriptions
SET timestamp = ?, value = ?
WHERE sensor_id = ?;

At the point I receive the new measurement though, I only know the sensor_id, timestamp and value.

for many to many relationship you have to add one extra table and must connect both tables to this newly created table to work properly — Ghayel

shutty shutty · Accepted Answer · 2015-11-26T10:37:03

I guess that you might revisit your monitor_subscriptions table to be:

append-only, not updating the latest value, but inserting a new one every time
split into two different tables optimal for your specific queries.

For example:

create table sensor_data (
  sensor_id uuid,
  timestamp timestamp,
  value double,
  primary key (sensor_id, timestamp)
) with clustering order by (timestamp desc);

This table is used for storing raw sensor readings, you can effectively query it for latest data for specific sensor. If you plan to insert a lot of sensor readings (like every second), you may want to add current day to clustering key to deal with possible compaction issues later.

And the monitor table may be looking like this:

create table monitor_subscriptions (
  monitor_id uuid,
  sensor_id uuid,
  attribute text,
  primary key (monitor_id, attribute, sensor_id)
)

This table can be used to query for all attributes for the monitor or all sensors for these attributes. So to query latest value for each attribute, you:

Query monitor_subscriptions for attribute->sensor mapping (best case: 1 disk read)
Query sensor_data for each sensor (best case: N disk reads, where N = number of sensors).

Modelling Cassandra Many-to-Many Relation for Sensor Data

1 Answers