2
votes

I want to use cassandra as a DB to store messages, when in my model messages are aggregate by channel. the 3 main important field of message:

  1. channel_id
  2. created_by
  3. message_id (unique)

The main read/fetch API is get messages by channel sorted by created_by. Plus, I have a low scale messages update by channel_id + message_id.

So my question is regarding the primary_key definition. If I will define it (channel_id,created_by) will I be able to do an UPDATE with WHERE cLause like channel_id=X and message_id=XX, even if message_id is not in the primary key (I do give the query the partition key)?

And if not, if I will define the primary key like this (channel_id,created_by, message_id) will I be able to do the read with where cause with only 1 clustering column (channel_id,created_by) and do the update using the where cause channel_id + message_id?

Thanks

2

2 Answers

1
votes

define it (channel_id,created_by) will I be able to do an UPDATE with WHERE cLause like channel_id=X and message_id=XX

No. All primary key components are required for a write operation in Cassandra. First you will have to provide created_by. message_id is not part of the key, so that will have to be removed.

And if not, if I will define the primary key like this (channel_id,created_by, message_id) will I be able to do the read with WHERE cause with only 1 clustering column (channel_id,created_by)

Yes, this will work:

SELECT * FROM messages WHERE channel_id='1' AND created_by='Aaron';

This ^ works, because you have provided the first two primary key components, without skipping any. Cassandra can easily find the node containing the partition for channel_id, and scan down to the row starting with created_by.

and do the update using the WHERE cause channel_id + message_id?

No. Again, you would need to provide created_by for the write to succeed.

0
votes

The primary key selection decision is one of the most important part in Cassandra data modeling. You need to understand the table. I am not sure if I can help you with the above-provided information by you. But I will still give it a try.

Your requirement:

  1. Sort by created_by.
  2. Update with channel_id + message_id

Try having channel_id + message_id as the partition key and created_by as clustering key. Message_id in the primary key will also help in ensuring uniqueness.

Recently I found DS220 course on Data modeling on https://academy.datastax.com/. This is awesome.