4
votes

I have a Cassandra column family where I am storing a large number (hundreds of thousands) of events per month with timestamp (“Ymdhisu”) as the row key. It has multiple columns capturing some data for each event. I tried retrieving events data for a specific time range. For example for the month of Jan, I used the following CQL query:

a) Query between range Jan 1- Jan 15, 2013

select count(*) from Test where Key > 20130101070100000000 and Key < 20130115070100000000 limit 100000; Bad Request: Start key's md5 sorts after end key's md5. This is not allowed; you probably should not specify end key at all, under RandomPartitioner

b) Query between range Jan 1- Jan 10, 2013

select count(*) from Test where Key > 20130101070100000000 and Key < 20130110070100000000 limit 100000; count - 73264

c) Query between range Jan 1- Jan 2, 2013

select count(*) from Test where Key > 20130101070100000000 and Key < 20130102070100000000 limit 100000; count - 78328

It appears as though the range search simply is not working! The schema of my Columnfamily is:

Create column family Test with comparator=UTF8Type and default_validation_class=UTF8Type and key_validation_class=UTF8Type AND compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64};

To extract data, what are the suggestions? Do I need to redefine my schema with key validation class as TimeUUID type? Is there any other way to query efficiently without changing the schema? I am dealing with at least 100-200K rows of data monthly in this column family. If this schema does not work for this purpose, what would be an appropriate Cassandra schema to store and retrieve the kind of data described here?

1

1 Answers

5
votes

You can create secondary indexes such as "Date" and "Month", and store each event's Date and Month in those columns along with other data. When querying data, you can fetch all rows for specified months or days.

I dont think range query on Keys will work. Perhaps if you change your partitioner from RandomPartitioner to ByteOrderedPartitioner?