Cassandra slice query quite weird results

Question

Reading the DataStax blog, I was testing the slice query, even if the blog has been done with Cassandra 2 and we are at 3.

So I define a test table as bellow.

CREATE TABLE foo(
  part_key TEXT,
  start_ts INT,
  end_ts INT,
  PRIMARY KEY(part_key, start_ts, end_ts)
) WITH CLUSTERING ORDER BY (start_ts ASC, end_ts ASC);

Some fixtures are inserted in this table.

INSERT INTO foo(part_key, start_ts, end_ts) VALUES ('A', 1, 5);
INSERT INTO foo(part_key, start_ts, end_ts) VALUES ('A', 2, 3);
INSERT INTO foo(part_key, start_ts, end_ts) VALUES ('B', 4, 7);
INSERT INTO foo(part_key, start_ts, end_ts) VALUES ('B', 9, 13);

INSERT INTO foo(part_key, start_ts, end_ts) VALUES ('B', 1, 2);

INSERT INTO foo(part_key, start_ts, end_ts)
VALUES ('B', 9, 9999); -- 9999 = symbolic value for no end

I check the B partition first:

> SELECT * FROM foo WHERE part_key='B';

 part_key | start_ts | end_ts
----------+----------+--------
        B |        1 |      2
        B |        4 |      7
        B |        9 |     13
        B |        9 |   9999

(4 rows)

We cannot execute the "natural" query SELECT * FROM foo WHERE part_key='B' AND start_ts <= 7 AND end_ts >= 7, as it would raise the following error.

InvalidRequest: Error from server: code=2200 [Invalid query] message="Clustering column "end_ts" cannot be restricted (preceding column "start_ts" is restricted by a non-EQ relation)"

In the DataStax blog there is a query as bellow.

SELECT * FROM numberOfRequests
    WHERE cluster = ‘cluster1’
    AND date = ‘2015-06-05’
    AND datacenter = 'US_WEST_COAST'
    AND (hour, minute) >= (12, 30) AND (hour) < (14)

So we try this workaround with the following slice query, using start_ts >= -9999 as always true and allows to specify together the inequality condition for end_ts.

SELECT * FROM foo WHERE part_key='B'
AND (start_ts, end_ts) >= (-9999, 7) AND start_ts <= 7; -- -9999 = min_value

It's executing without any warning and gives the results as bellow.

 part_key | start_ts | end_ts
----------+----------+--------
        B |        1 |      2
        B |        4 |      7

(2 rows)

At this point, the row #1 (with start_ts = 1) should not be there for me as end_ts >= 7 is not true for this row. As start_ts >= -9999 is always true it seems that the first condition (start_ts, end_ts) >= (-9999, 7) is just ignored: the results are the same with the query SELECT * FROM foo WHERE part_key='B' AND start_ts <= 7.

Reading the Blog example I was thinking that (at least with C2.2), (start_ts, end_ts) >= (-9999, 7) AND start_ts <= 7 would mean start_ts >= -9999 AND end_ts >= 7 AND start_ts <= 7, similarily to (start_ts, end_ts) = (4, 7) which means start_ts = 4 AND end_ts = 7 as bellow.

SELECT * FROM foo WHERE part_key='B' AND (start_ts, end_ts) = (4, 7);

 part_key | start_ts | end_ts
----------+----------+--------
        B |        4 |      7

(1 rows)

How such tuple-like inequality is really interpreted? Is there any way to update it so it "works"?

That doesn't really answer, or at least indicate that Cassandra results are not consistent. — cchantep

Jeff Jirsa Jeff Jirsa · Accepted Answer · 2017-08-11T18:40:09

What you're hitting here is comparison by tuple - because you've combined your two columns into a tuple, you're getting a slightly different comparison behavior than you expect. It's correct, it's just not what you expect.

(start_ts, end_ts) >= (-9999, 7)

That does not mean start_ts >= -9999 && end_ts >= 7 , it means the left (inclusive) bound is the tuple (-9999,7) . There may be a tuple (-9998,1) which is tuple-wise greater than (-9999,7) , even though end_ts==1 is less than 7

Cassandra slice query quite weird results

4 Answers