Tips for monitoring data model with cassandra

Question

I'm relatively new to cassandra and have to evaluate different NoSQL-Solutions for a monitoring tool. One datum is just about 100 Bytes big, but there are really a lot of them. During a day we get about 15 million records... So I'm currently testing with 900 million records (about 15GB as SQL-Insert Script)

My first question is: Does cassandra fit my needs? I need to do range querys (on the date the records were created) and sum up some of the columns according to groups defined by "secondary indexes" stored in the datum.)

I already tried MongoDB but it's really poor MapReduce made a really crappy job... I also read about HBase, but the enormous amount of configuration needed for it makes me hope that there could be solution with Cassandra...

A second question is: how I could store my data to access it in the ways mentioned above? I already thought of a super column family, where the key is the date (as long since 1970) and the columns would be the datums taken at that time... but if I use Random Partitioner, I can't do fast range querys on it (as I know) and if I use Order Preserving Partitioner the data won't be spread over my cluster (currently consisting of two nodes).

I hope I gave you all the necessary information... Thank you for your help!

andy

jbellis jbellis · Accepted Answer · 2011-06-30T00:49:54

Sounds like a job for Brisk (Cassandra + Hadoop distribution). Full Hadoop map/reduce including Hive support, virtually no configuration required.

http://www.datastax.com/products/brisk

Tips for monitoring data model with cassandra

2 Answers