Why not display in "ORDER BY DESC" in Cassandra answer?

Question

Hello, everybody! I create keyspace in Cassandra:


    CREATE KEYSPACE monitoring WITH replication = {
         'class': 'SimpleStrategy',
         'replication_factor': '1'
    };


    CREATE TABLE monitoring.data (
        number text,
        day timestamp,
        last_day timestamp static,
        ids text static,
        PRIMARY KEY (number, day)
    ) WITH CLUSTERING ORDER BY (day DESC);

Later i insert data:


    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2017-05-26', '2017-05-26', '["1","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2017-10-26', '2017-10-26', '["1","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2017-05-01', '2017-05-01', '["1","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('123456AA901', '2017-05-01', '2017-05-01', '["A","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('123456BB901', '2017-05-01', '2017-05-01', '["B","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('123456CC901', '2017-05-01', '2017-05-01', '["C","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('123456DD901', '2017-05-01', '2017-05-01', '["D","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2017-05-23', '2017-05-23', '["1","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('12345678901', '2018-05-26', '2018-05-26', '["1","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('23456789012', '2017-04-01', '2017-04-01', '["6","2","11"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('34567890123', '2017-03-28', '2017-03-28', '["1","5","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('45678901234', '2017-04-03', '2017-04-03', '["12","2","3"]');
    INSERT INTO monitoring.data (number, day, last_day, ids) VALUES ('56789012345', '2018-01-26', '2018-01-26', '["3","2","1"]');

Next i make query:


    select distinct number,last_day,ids from monitoring.data WHERE number in ('12345678901','56789012345','45678901234');

Why Cassandra answer is and number 45678901234 between 12345678901 and 56789012345?


         number        | last_day                 | ids
    -------------+--------------------------+----------------
     12345678901 | 2018-05-25 21:00:00+0000 |  ["1","2","3"]
     45678901234 | 2017-04-02 21:00:00+0000 | ["12","2","3"]
     56789012345 | 2018-01-25 21:00:00+0000 |  ["3","2","1"]

How get right answer? Replication factor it's important in this situation? Later i will use LIMIT 10...

last_day or day? If you want by last_day, then this one should be clustering column. — Horia

Aaron Aaron · Accepted Answer · 2018-02-05T14:00:47

Simply put, number is your partition key, and you can only enforce a sort order at the clustering key level. When filtering with a non-equals clause on your partition key (like IN) the order of your results cannot be relied upon. If you were to remove the IN clause you would get the rows returned in the order by their hashed partition keys. If I make an adjustment to your query to use the token() function on number, the order of the results makes more sense:

aploetz@cqlsh:stackoverflow> select distinct number,token(number),last_day,ids 
    FROM data;

number      | system.token(number) | last_day                        | ids
-------------+----------------------+---------------------------------+----------------
 123456BB901 | -7512323826965212800 | 2017-05-01 05:00:00.000000+0000 |  ["B","2","3"]
 123456DD901 | -5242683095224762575 | 2017-05-01 05:00:00.000000+0000 |  ["D","2","3"]
 23456789012 | -2843835925329100734 | 2017-04-01 05:00:00.000000+0000 | ["6","2","11"]
 123456CC901 |   970122905143661162 | 2017-05-01 05:00:00.000000+0000 |  ["C","2","3"]
 45678901234 |  2207499658550692669 | 2017-04-03 05:00:00.000000+0000 | ["12","2","3"]
 12345678901 |  3063849707784841171 | 2018-05-26 05:00:00.000000+0000 |  ["1","2","3"]
 123456AA901 |  4307148681570630627 | 2017-05-01 05:00:00.000000+0000 |  ["A","2","3"]
 56789012345 |  5304329977670805052 | 2018-01-26 06:00:00.000000+0000 |  ["3","2","1"]
 34567890123 |  6079361129233417517 | 2017-03-28 05:00:00.000000+0000 |  ["1","5","3"]

(9 rows)

The bottom line, is that you cannot enforce a sort order without an equals condition on the partition key.

Unfortunately, what you are trying to do really isn't a strength of Cassandra. To get the answer you want, you'll have to look for similarities in the results that you expect, and then design a query table to support it. But if you're only talking about 10 rows, it might just be easier to sort your results on the application side.

Why not display in "ORDER BY DESC" in Cassandra answer?

1 Answers