3
votes

hi i am new to Cassandra. I have little bit confusion in DB design in below scenario.

Currently i have 3 table : Post, User, PostLike.

Post : store post info

User : store user info

PostLIke :

CREATE TABLE PostLike (
    like_time timestamp
    post_id bigint,
    user_id bigint,
    PRIMARY KEY (like_time,post_id,user_id)
);

like_time : used to store post order by post like time. cassandra provide this in OrderPreservingPartitioner

requirement is:

  1. All Users Id which like a given post order by like_time and got them using : select * from PostLike where post_id = ?

  2. All posts liked by a user select * from PostLike where user_id = ? : it gave error

[Invalid query] message="PRIMARY KEY column "post_id" cannot be restricted (preceding column "ColumnDefinition{name=user_id, type=org.apache.cassandra.db.marshal.LongType, kind=CLUSTERING_COLUMN, componentIndex=0, indexName=null, indexType=null}" is either not restricted or by a non-EQ relation)"

pls suggest what i need to do here :

  1. need to Use MySQL with Cassandra for these relation

    OR

Create 2 separate table in cassandra

CREATE TABLE PostLike (
    like_time timestamp
    post_id bigint,
    PRIMARY KEY (like_date,post_id)
);

CREATE TABLE UserLike (
    like_time timestamp
    user_id bigint,
    PRIMARY KEY (like_date,user_id)
);

or any other solution. Please help.

2
Go with option #2. Denormalization is the new black.ethrbunny
thanks for reply . also above error resolve if i add indexNavrattan Yadav

2 Answers

2
votes

Below Error resolve if i create index.

CREATE INDEX post_id_PostLike_indx ON post_like (post_id);
CREATE INDEX user_id_PostLike_indx ON post_like (user_id);

[Invalid query] message="PRIMARY KEY column "post_id" cannot be restricted (preceding column "ColumnDefinition{name=user_id, type=org.apache.cassandra.db.marshal.LongType, kind=CLUSTERING_COLUMN, componentIndex=0, indexName=null, indexType=null}" is either not restricted or by a non-EQ relation)"

2
votes

First of all, you are getting that error because you are specifying the second part of the primary key, without specifying the first part. When querying in Cassandra by a compound primary key, you cannot skip parts of the key. You can leave parts off of the end of the key (as in, just query by the partitioning key (see below), but it won't work if you try to skip parts of the key.

Next, secondary indexes do not work the same in Cassandra as they do in MySQL. In Cassandra, they are provided for convenience, and not for performance. The cardinality of post_id and user_id will likely be too high to be efficient. Especially in a large cluster with millions of rows, secondary index query performance will drop-off significantly on a high-cardinality secondary index.

The proper way to solve this, is to use your second option (as etherbunny recommended), but with a re-order of your primary keys.

CREATE TABLE PostLike (
    like_time timestamp
    post_id bigint,
    PRIMARY KEY (post_id,like_date)
);

CREATE TABLE UserLike (
    like_time timestamp
    user_id bigint,
    PRIMARY KEY (user_id,like_date)
);

The first key in a Cassandra primary key is known as the partitioning key. This key will determine which token range your row will be stored in.

The remaining keys in a Cassandra primary key are known as clustering columns. The clustering columns help to determine the on-disk sort order within a partitioning key.

That last part is important, as it (clustering order, as well as the ORDER BY keyword) behaves very differently from MySQL or any RDBMS. This way, if you SELECT * FROM user_like WHERE user_id=34574398 ORDER BY like_date you should see the likes for that user_id ordered by like_date. In fact, even without the ORDER BY clause, they should still be sorted by like_date. However, if you were to SELECT * FROM user_like ORDER BY like_date, your data would not sort in the expected order, because ordering only works when a partitioning key is specified.