Cassandra compound key performance

Question

I am using Cassandra for saving logs, and on client-side I want to show all logs for some day for example.

Of course for one day there can be thousands of log records, and I need to use paging.

I saw that paging in not like "native" in cassandra, and we need to use some "tricks", like saving last retreived record, and looking for more records after that record.

My idea is to save uuid and date for primary key, and then order column familly by date, so I can pass uuid and date, and cassandra should give me records after that record, and so on.

Does anyone knows is this good idea, I mean in terms of performance. Is it good to have uuid and date for compound keys? Or maybe there is better solution for solving this?

Thank you!

Alex Popescu Alex Popescu · Accepted Answer · 2014-08-30T20:49:29

As far as I can tell, your choice of primary key based on an id and date should helps to retrieve all the logs for one day. What you probably need to validate is that:

each log entry is not a huge value
you'll not have more than 2bn log entries per day (in that case you'll probably need to change the primary key to use a sub-day interval)

As regards pagination, if you are using Cassandra 2.0 this should work (there were some corner case issues with auto pagination until, iirc, 2.0.9 though). The blog post Improvements on the driver side with Cassandra 2.0 should give you an idea of how pagination worked in Cassandra 1.2 and the improvement in 2.0

Cassandra compound key performance

1 Answers