I'm relatively new to cassandra and have to evaluate different NoSQL-Solutions for a monitoring tool. One datum is just about 100 Bytes big, but there are really a lot of them. During a day we get about 15 million records... So I'm currently testing with 900 million records (about 15GB as SQL-Insert Script)
My first question is: Does cassandra fit my needs? I need to do range querys (on the date the records were created) and sum up some of the columns according to groups defined by "secondary indexes" stored in the datum.)
I already tried MongoDB but it's really poor MapReduce made a really crappy job... I also read about HBase, but the enormous amount of configuration needed for it makes me hope that there could be solution with Cassandra...
A second question is: how I could store my data to access it in the ways mentioned above? I already thought of a super column family, where the key is the date (as long since 1970) and the columns would be the datums taken at that time... but if I use Random Partitioner, I can't do fast range querys on it (as I know) and if I use Order Preserving Partitioner the data won't be spread over my cluster (currently consisting of two nodes).
I hope I gave you all the necessary information... Thank you for your help!
andy