1
votes

I have a problem with understanding a one thing from this article - http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling

Exercise - We want get all users by groupname.

Solution:

CREATE TABLE groups (
    groupname text,
    username text,
    email text,
    age int,
    PRIMARY KEY (groupname, username)
);

SELECT * FROM groups WHERE groupname = 'footballers';

But to find all users in group we can set: PRIMARY KEY (groupname) and it work's also.

Why is needed in this case a clustering key (username)? I know that when we set username as the clustering key we can use it in a WHERE clause. But to find users only by groupname is any difference between PRIMARY KEY (groupname) and PRIMARY KEY (groupname, username) in terms of query efficiency?

2

2 Answers

3
votes

Clustering keys provide multiple benefits: Query flexibility, result set ordering (within a partition key) and extended uniqueness.

But to find all users in group we can set: PRIMARY KEY (groupname)

Try that once. Create a new table using only groupname as your PRIMARY KEY, and then try to insert multiple usernames for each group. You will find that there will only ever be one group, and that the username column will be overwritten for each new user within that group.

But to find users only by groupname is any difference between PRIMARY KEY (groupname) and PRIMARY KEY (groupname, username) in terms of query efficiency?

If PRIMARY KEY (groupname) performs faster, the most-likely reason is because there can be only a single row returned.

In this case, defining username as a clustering key provides:

  1. The ability to sort by username within a group.

  2. The ability to query for a specific username within a group.

  3. The ability to add multiple usernames within a group.

1
votes

You don't need the clustering key if you want to query by groupname.

If you add a clustering key (username in this exemple) rows will be ordered by username for a groupname.