I am in the process of learning Cassandra as an alternative to SQL databases for one of the projects I am working for, that involves Big Data.
For the purpose of learning, I've been watching the videos offered by DataStax, more specifically DS220 which covers modeling data in Cassandra.
While watching one of the videos in the course series I was introduced to the concept of splitting partitions to manage partition size.
My current understanding is that Cassandra has a max logical capacity of 2B entries per partition, but a suggested max of a couple 100s MB per partition.
I'm currently dealing with large amounts of real-time financial data that I must store (time series), meaning I can easily fill out GBs worth of data in a day.
The video course talks about introducing an additional partition key in order to split a partition with the purpose or reducing the size per partition requirement. The video pointed out to using either a time based key or an arbitrary "bucket" key that gets incremented when the number of manageable rows has been reached.
With that in mind, this led me to the following problem: given that partition keys are only used as equality criteria (ie. point to the partition to find records), how do I find all the records that end up being spread across multiple partitions without having to specify either the bucket or timestamp key?
For example, I may receive 1M records in a single day, which would likely go over the 100-500Mb partition limit, so I wouldn't be able to set a partition on a per date basis, that means that my daily data would be broken down into hourly partitions, or alternatively, into "bucketed" partitions (for balanced partition sizes). This means that all my daily data would be spread across multiple partitions splits.
Given this scenario, how do I go about querying for all records for a given day? (additional clustering keys could include a symbol for which I want to have the results for, or I want all the records for that specific day)
Any help would be greatly appreciated.
Thank you.