23
votes

I am using Cassandra for the first time in a web app and I got a query problem. Here is my tab :

CREATE TABLE vote (
    doodle_id uuid,
    user_id uuid,
    schedule_id uuid,
    vote int,
    PRIMARY KEY ((doodle_id), user_id, schedule_id)
);

On every request, I indicate my partition key, doodle_id. For example I can make without any problems :

select * from vote where doodle_id = c4778a27-f2ca-4c96-8669-15dcbd5d34a7 and user_id = 97a7378a-e1bb-4586-ada1-177016405142;

But on the last request I made :

select * from vote where doodle_id = c4778a27-f2ca-4c96-8669-15dcbd5d34a7 and schedule_id = c37df0ad-f61d-463e-bdcc-a97586bea633;

I got the following error :

Bad Request: PRIMARY KEY column "schedule_id" cannot be restricted (preceding column "user_id" is either not restricted or by a non-EQ relation)

I'm new with Cassandra, but correct me if I'm wrong, in a composite primary key, the first part is the PARTITION KEY which is mandatory to allow Cassandra to know where to look for data. Then the others parts are CLUSTERING KEY to sort data.

But I still don't get why my first request is working and not the second one ?

If anyone could help it will be a great pleasure.

2

2 Answers

19
votes

In Cassandra, you should design your data model to suit your queries. Therefore the proper way to support your second query (queries by doodle_id and schedule_id, but not necessarilly with user_id), is to create a new table to handle that specific query. This table will be pretty much the same, except the PRIMARY KEY will be slightly different:

CREATE TABLE votebydoodleandschedule (
    doodle_id uuid,
    user_id uuid,
    schedule_id uuid,
    vote int,
    PRIMARY KEY ((doodle_id), schedule_id, user_id)
);

Now this query will work:

SELECT * FROM votebydoodleandschedule 
WHERE doodle_id = c4778a27-f2ca-4c96-8669-15dcbd5d34a7 
AND schedule_id = c37df0ad-f61d-463e-bdcc-a97586bea633;

This gets you around having to specify ALLOW FILTERING. Relying on ALLOW FILTERING is never a good idea, and is certainly not something that you should do in a production cluster.

5
votes

The clustering key is also used to find the columns within a given partition. With your model, you'll be able to query by:

  • doodle_id
  • doodle_id/user_id
  • doodle_id/user_id/schedule_id
  • user_id using ALLOW FILTERING
  • user_id/schedule_id using ALLOW FILTERING

You can see your primary key as a file path doodle_id#123/user_id#456/schedule_id#789 where all data is stored in the deepest folder (ie schedule_id#789). When you're querying you have to indicate the subfolder/subtree from where you start searching.

Your 2nd query doesn't work because of how columns are organized within partition. Cassandra can not get a continuous slice of columns in the partition because they are interleaved.

You should invert the primary key order (doodle_id, schedule_id, user_id) to be able to run your query.