How do I sort data by the last update date in Cassandra?

Question

I need advice to correctly design the table in Cassandra. I need to get a sorted list of all the books. Sorting is performed by the date of the last update. Each time a particular book is purchased, the number_of_buyers column is updated. Also, I need to update the value of the updated_at column. The problem is the updated_at column is the clustering key which is the part of the primary key. We can't update values in columns that are part of the primary key.

create table books (
   book_id uuid,
   created_at timestamp,
   updated_at timestamp,
   book_name varchar,
   book_author varchar,
   number_of_buyers int,
   primary key (book_id, updated_at)
) with clustering order by (updated_at desc);

Another example:

create table chat_rooms (
   chat_room_id uuid,
   created_at timestamp,
   updated_at timestamp,
   last_message_content varchar,
   last_message_author varchar,
   unread_messages_number int,
   primary key (chat_room_id, updated_at)
) with clustering order by (updated_at desc);

Each chat room has the latest message. This information is always changing. In cases of change, I want to put the chat room at the top of the list. Classic behavior in many messengers.

You need to rethink an approach - sorting in Cassandra happens only inside partition, not between partitions, so the current model won't work for you. The real question that you need to ask - do you need this sorted list for all books, or only for some subset? Like, books sold during the day, or something like. — Alex Ott
In my case, I need a list of all books. If any information about a particular book is updated I need to put it at the top of the list. To achieve this behavior, I have to recreate an entry in the table. That is, delete the record and create it again, but with updated data. I agree with you that there are problems with the design of the model. How would you redo the design of the model for this request? — Nurzhan Nogerbek
what will be the query for that - pull all books? Do you really need Cassandra for that task, because all books would be easily handled by mysql or postgresql... — Alex Ott
Try to abstract. Please check my post again. In place of books, there may be chat rooms. Each chat room has the latest message. This information is always changing. In cases of change, I want to put the chat room at the top of the list. Classic behavior in many messengers. I think Cassandra is not redundant here. Plus, you can get not all chat rooms, but only the last 100. Do you have any ideas now? — Nurzhan Nogerbek

Aaron Aaron · Accepted Answer · 2020-10-05T14:33:42

So for sure; you are going to need to partition on something different. The trick is going to be finding the right balance of query flexibility (your obvious need here) while avoiding unbound partition growth.

For the books table, is it possible to partition on something like category? You know, like horror, fantasy, graphic novel, non-fiction, instructional, etc..?

CREATE TABLE book_events (
   book_id uuid,
   created_at timestamp,
   updated_at timestamp,
   book_name varchar,
   book_author varchar,
   number_of_buyers int,
   category text,
   PRIMARY KEY (category, book_name, updated_at, book_id)
) WITH CLUSTERING ORDER BY (book_name ASC,updated_at DESC,book_id ASC);

For the PRIMARY KEY definition, we can partition on category, and then cluster on book_name and updated_at, with book_id on the end (for uniqueness). Then, INSERT a new row for each sale event. On the query (after inserting a few rows), use the MAX aggregation on updated_at while using the GROUP BY clause on book_name.

SELECT book_name,book_author,number_of_buyers,MAX(updated_at) FROm book_events 
 WHERE category='Computers & Technology' GROUP BY book_name;

 book_name                       | book_author                                                | number_of_buyers | system.max(updated_at)
---------------------------------+------------------------------------------------------------+------------------+---------------------------------
  Mastering Apache Cassandra 3.x |                                Aaron Ploetz, Teja Malepati |               52 | 2020-10-05 14:29:33.134000+0000
 Seven NoSQL Databases in a Week | Aaron Ploetz, Devram Kandhare, Brian Wu, Sudarshan Kadambi |              163 | 2020-10-05 14:29:33.142000+0000

(2 rows)

The only other consideration, is what to do with the obsoleted sale rows. You could delete them as you go, depending on the write frequency, of course. The most-optimal solution would be to consider the cadence of sales, and apply a TTL.

This solution is definitely not complete as-is, but I hope it leads you in the proper direction.

How do I sort data by the last update date in Cassandra?

1 Answers