I have a question on wide rows, clustering, manual indexes etc... I am hoping someone can assist here. CQL version is 3 and Cassandra is 2.0.1;
Let's say, I have CF 'products'
id timeuuid
location varchar
shopname varchar
expiry timestamp
count int
PRIMARY KEY (id)
I want to be able to select products at specific location ordered by expiry. Therfore create register like:
CF 'id_register_by_loc_expy'
location varchar
expiry timestamp
id timeuuid
PRIMARY KEY (location,expiry,id)
and want to select products at specific shopname ordered by expiry. Then create:
CF 'id_register_by_shopname_expy'
shopname vachar
expiry timestamp
id timeuuid
PRIMARY KEY (shopname,expiry,id)
This is so I can do efficient queries/slicing as follows:
1.select id from id_reg_by_loc_expy where location = 'x'; // [naturally ordered by expiry]
2.select id from id_reg_by_loc_expy where location = 'x' and expiry > 't1' and expiry < 't2';
3.select * from products where id = 'id';
and;
4.select id from id_reg_by_shop_exp where shopname = 'y'; // [naturally ordered by expiry]
and;
5.select id from id_reg_by_shop_count where shopname = 'y'; // [naturally ordered by count]
etc..
What if the clustering key needs to change and I need to reorder the entries on these particular rows in the register(s).
Issues I have are:
reinserting with a new expiry (or count) results in new Primary Key therefore doesn't update my old entry.
I can't "update .. set expiry = 'x2' where ..." since expiry is part of primary key.
Inserting with new primary key then deleting old one is a bad option due to the tombstoning limitation.
Things I have tried are:
CF 'id_reg_by_loc_expy'
location varchar expiry timestamp id timeuuid otherSecondaryIndex varchar PRIMARY KEY (location,id)
But;
a. this does not take advantage of Cassandra's stored sorted functionality. I expect to have many products in each row, and want to avoid needing to search the entire row; and
b. it turns out that I can't actually do a query like the following anyway:
(i)select id from id_reg_... where location='x' order by dtg asc;
Bad Request: ORDER BY with 2ndary indexes is not supported.
Or
(ii)select id from id_reg_... where location='x' and expiry > 't1' and expiry < 't2';
Bad Request: No indexed columns present in by-columns clause with Equal operator
Although I 'can' do this:
(iii)select id from id_reg_... where location='x' and otherSecIndex='y' and expiry > 't1' and expiry < 't2';
** note that this requires me to force 'allow filtering' and seems poor design to include another secondary index simply to allow this query.. i.e. a query of which I'm less interested in than the 'order by' query anyway.
2. Using timeuuid in lieu of timestamp for the expiry. Even if this comes to work which I can't find a way, it doesn't help my 'ordering by count' intentions.
Am I missing something fundamental here? Is the answer that I need to go ahead with all the tombstone mitigation techniques? or do some of the ordering in my application?
Cheers, Tim