I'm working with a system that uses a secondary index in cassandra, along with a composite primary key, e.g.
CREATE TABLE table (
a bigint,
b bigint,
c bigint,
PRIMARY KEY (a, b, c)
) WITH CLUSTERING ORDER BY (b ASC, c ASC)
CREATE INDEX secondary_index ON table (c);
One of the operations in the application using the table is to fetch a number of rows (typically tens) specifying the partition key and the secondary index key. Currently, it performs one query for each (partition key, secondary key) pair, in parallel, which works fine, e.g.:
select * from table where a = ? and c = ?;
However, I've noticed that the system's workload is such that most of the time, there is significant overlap in the partition keys across the requested rows, sometimes more than half of them have the same partition key. So, I thought that it might be more efficient to perform one query per partition key, with an IN clause on the secondary key, reducing the number of overall queries to single digits in most cases, and reducing read query overhead on the cluster.
However, at least executed from cqlsh, this does not seem to be allowed:
select * from table where a = ? and c in (...);
InvalidRequest: Error from server: code=2200 [Invalid query] message="PRIMARY KEY column "c" cannot be restricted as preceding column "b" is not restricted"
Is this just not allowed, and I'll have to continue making individual queries? Is there some reason it wouldn't actually be more efficient? Or is this just a limitation of CQL, and IN queries cannot use the secondary index? Perhaps there is an issue because the secondary index key is also in the primary key, and Cassandra attempts to use that instead of the secondary index?
allow filtering
– Alex Ott