2
votes

I create a table in Cassandra for monitoring insert from an application. My partition key is an int composed by year+month+day, my clustering key a timestamp and after that my username and some others fields.

I would like to display the last 5 inserts but it's seems that the partition key go before the "order by desc".

How can I get the correct result ? Normaly clustering key induces the order so why I get this result? (Thank in advance)


Informations :

Query : select tsp_insert, txt_name from ks_myKeyspace.myTable limit 5;

Result :

 idt_day  | tsp_insert               | txt_name  
----------+--------------------------+----------
 20161028 | 2016-10-28 15:21:09+0000 | Jean
 20161028 | 2016-10-28 15:21:01+0000 | Michel
 20161028 | 2016-10-28 15:20:44+0000 | Quentin
 20161031 | 2016-10-31 09:24:32+0000 | Jacquie
 20161031 | 2016-10-31 09:23:32+0000 | Gabriel

Wanted :

 idt_day  | tsp_insert               | txt_name  
----------+--------------------------+----------
 20161031 | 2016-10-31 09:24:32+0000 | Jacquie
 20161031 | 2016-10-31 09:23:32+0000 | Gabriel
 20161028 | 2016-10-28 15:21:09+0000 | Jean
 20161028 | 2016-10-28 15:21:01+0000 | Michel
 20161028 | 2016-10-28 15:20:44+0000 | Quentin

My table :

CREATE TABLE ks_myKeyspace.myTable(
idt_day int,
tsp_insert timestamp,
txt_name  text, ...
PRIMARY KEY (idt_day, tsp_insert)) WITH CLUSTERING ORDER BY (tsp_insert DESC);
2

2 Answers

4
votes

Ultimately, you are seeing the current order because you are not using a WHERE clause. You can see what's going on if you use the token function on your partition key:

aploetz@cqlsh:stackoverflow> SELECT idt_day,tsp_insert,token(idt_day),txt_name FROM mytable ;

 idt_day  | tsp_insert                      | system.token(idt_day) | txt_name
----------+---------------------------------+-----------------------+----------
 20161028 | 2016-10-28 15:21:09.000000+0000 |    810871225231161248 |     Jean
 20161028 | 2016-10-28 15:21:01.000000+0000 |    810871225231161248 |   Michel
 20161028 | 2016-10-28 15:20:44.000000+0000 |    810871225231161248 |  Quentin
 20161031 | 2016-10-31 09:24:32.000000+0000 |   5928478420752051351 |  Jacquie
 20161031 | 2016-10-31 09:23:32.000000+0000 |   5928478420752051351 |  Gabriel

(5 rows)

Results in Cassandra CQL will always come back in order of the hashed token value of the partition key (which you can see by using token). Within the partition keys, your CLUSTERING ORDER will be enforced.

That's key to understand... Result set ordering in Cassandra can only be enforced within a partition key. You have no control over the order that the partition keys come back in.

In short, use a WHERE clause on your idt_day and you'll see the order you expect.

0
votes

It seems to me that you are getting the whole thing wrong. Partition keys are not used for ordering data, they are used only to know the location of your data in the cluster, specifically the node. Moreover, the order really matters inside a partition only...

Your query results really are unpredictable. Depending on which node is faster to answer (assuming a cluster and not a single node), you can get every time a different result. You should try to avoid selecting without partition restrictions, they don't scale.

You can however change your queries and perform one select per day, then you'd query for ordered data (your clustering key) in an ordered manner ( you manually chose the order of the days in your queries). And as a side note it would be faster because you could query multiple partitions in parallel.