We want to use Apache Cassandra for storing big data gathered from realm time sensor data. We have developed an IOT platform capable to handle 1 million events per second. We want to persist them in Cassandra.
Our table looks like :
Sensor_data_by_date
Realm text K
Bucket int K
dateTimeReceived timestamp Clustering column
sensor_id text
Message_id text
Sensor_name text
Query we are interested in is :
Give me all results for all sensor data for “realm-a” for a dateTime range say “5th may” to “12th may” order by “dateTimeReceived”.
Solution : Since our platform can handle upto 1 million events per second, when I even try to include DATE + HOUR as partition key, it will still increase the maximum recommended size by Casssandra. So we decided to keep bucket along with realm as partition key.
Problem : Now say when we have a wide range of date range as mentioned (5th may to 13th may), we will have multiple buckets to lookup from. We also need to support ordering. When we have this in place, I need to use “IN clause” for buckets say :
where realm=realm-a and bucket in (1,2,3,4)
and dateTimeReceived>…
and dateTimeReceived <… order by dateTimeReceived
This would complain that IN clause and order by cant work together with pagination. I need to have pagination as well…..
Can you please help me how to achieve this functionality?
Help will be much appreciated.