I need to store sensor readings in cassandra (version 2!). There are n sensors, of which each can send up to m different values which have different types (e.g. Float, Bool, String). The values have to be stored in cassandra. Later, values will be queried mostly by time ranges. So a query could be "give me all readings from 2016-05-01 09:00 to 2016-05-15 13:00". There could be filters by sensor ID/type, but the main query will always be the time. (So a query could be "Give me all data for sensor 1 and 5 from 2016-05-05" but most likely not "give me all data for sensor 1 and 5").
For more detailed queries, it is ok if all the data (restricted by time and possibly sensor ID) has to be scanned. So for the query "give me all sensor data for sensor 5 from 2016-05-05 where the float value of a reading is greater than 1000" it's OK if cassandra has to scan all the values of sensor 5 from 2016-05-05!
I read a lot of blog posts/questions about data modelling, (e.g. [1] [2] [3] [4] [5] [6] ) but some stuff is years old and I am not sure, if it is still the right way to do it.
My main questions are:
- What data type do I use for the timestamp (needs millisecond resolution)
- How do I define the keys? (e.g. do I need an hourly primary key like some examples use? If yes, can I combine results for more than one hour in cassandra or do I need to do that manually?)
- how do I add the sensorID so it can also be efficiently queried
The sensor data will always be inserted ordered, so no previous data is changed and no data with a timestamp lower than the current maximum will ever be added.