2
votes

I have an alerts table. I want to query on it using the IN operator on 2 columns and using the greater than operator on one column. I have tried the below things with no luck. Can someone please tell me the DB design to make the query work? My environment details: [cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native protocol v3]

With 'type' in partition key:

CREATE TABLE alerts (
    serialNumber text,
    time bigint,
    type text,
    time2 int,
    status text,
    parentId int,
    PRIMARY KEY ((serialNumber,type), time)
 ) WITH CLUSTERING ORDER BY (time DESC);

cqlsh:testdb> select * from alerts WHERE serialNumber IN ( '1','2') AND type IN ( '1','2','3' ) AND time > 1;
code=2200 [Invalid query] message="Partition KEY part serialNumber cannot be restricted by IN relation (only the last part of the partition key can)"

With 'type' in clustering key:

CREATE TABLE alerts (
    serialNumber text,
    time bigint,
    type text,
    time2 int,
    status text,
    parentId int,
    PRIMARY KEY (serialNumber, type, time)
) WITH CLUSTERING ORDER BY (type ASC,time DESC);

cqlsh:testdb> select * from alerts WHERE serialnumber IN ( '1','2') AND type IN ( 'a','b') and time > 1;
code=2200 [Invalid query] message="Clustering column "type" cannot be restricted by an IN relation"

With index on type:

CREATE TABLE alerts (
    serialNumber text,
    time bigint,
    type text,
    time2 int,
    status text,
    parentId int,
    PRIMARY KEY (serialNumber, time)
) WITH CLUSTERING ORDER BY (time DESC);
CREATE INDEX alertsTypeIndex ON alerts(type);

select * from alerts WHERE serialnumber IN ( '1','2') and time > 1 AND type IN ( 'a','b');
code=2200 [Invalid query] message="IN predicates on non-primary-key columns (type) is not yet supported"

select * from alerts WHERE serialnumber IN ( '1','2') and time > 1 AND type = 'a';
code=2200 [Invalid query] message="Select on indexed columns and with IN clause for the PRIMARY KEY are not supported"
2

2 Answers

2
votes

Which version are you on? Your second example worked for me in 2.2.1:

Connected to VaporTrails at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.1 | CQL spec 3.3.0 | Native protocol v4]
Use HELP for help.
aploetz@cqlsh> use stackoverflow ;
aploetz@cqlsh:stackoverflow> CREATE TABLE alerts (
                 ...     serialNumber text,
                 ...     time bigint,
                 ...     type text,
                 ...     time2 int,
                 ...     status text,
                 ...     parentId int,
                 ...     PRIMARY KEY (serialNumber, type, time)
                 ... ) WITH CLUSTERING ORDER BY (type ASC,time DESC);
aploetz@cqlsh:stackoverflow> INSERT INTO alerts (serialnumber , time,type,time2, status, parentid) VALUES ('1',0,'1',1,'1',1);
aploetz@cqlsh:stackoverflow> INSERT INTO alerts (serialnumber , time,type,time2, status, parentid) VALUES ('2',2,'2',2,'2',2);
aploetz@cqlsh:stackoverflow> INSERT INTO alerts (serialnumber , time,type,time2, status, parentid) VALUES ('3',3,'3',3,'3',3);
aploetz@cqlsh:stackoverflow> select * from alerts WHERE serialNumber IN ( '1','2') AND type IN ( '1','2','3' ) AND time > 1;

 serialnumber | type | time | parentid | status | time2
--------------+------+------+----------+--------+-------
            2 |    2 |    2 |        2 |      2 |     2

(1 rows)

Even if you can get it to work, I'm going to recommend against it. Using the IN keyword on a partition key is known as the "multi-key" query anti-pattern. DataStax has a blurb in their documentation discussing why this is bad. Essentially, this approach doesn't scale because many nodes must be queried to satisfy these types of queries. You should find another way to model your alerts so that you do not need to use an IN relation.

0
votes

I've just got a similar problem, where I wanted to use in operator on a Cassandra column, but wanted to make sure it wouldn't be bad as described by @Aaron.

Looking at the docs there are plenty of explanation about it: https://docs.datastax.com/en/archived/cql/3.1/cql/cql_reference/select_r.html#reference_ds_d35_v2q_xj__selectIN

TLDR:

The IN condition is recommended on the last column of the partition key only if you query all preceding columns of key for equality

CREATE TABLE parts (part_type text, part_name text, part_num int, part_year text, serial_num text, PRIMARY KEY ((part_type, part_name), part_num, part_year));

SELECT * FROM parts WHERE part_type='alloy' AND part_name='hubcap' AND part_num=1249 AND part_year IN ('2010', '2015');

For more examples and problems, head to the link above.