0
votes

My project serves both real-time data and past data. It works like feed, so it shows real-time data through socket and past data(if scroll down) through REST api. To get real-time data efficiently, I set date as partition key and time as clustering key. For real-time service, I think this data structure is well modeled. But I also have to get limited number of recent datas(like pagination), which should able to show whole data if requested. To serve data like recent 0~20 / 20~40 / 40~60 through REST api calls, my data-serving server has to remember what it showed before to load next 20 datas continuously, as bookmark. If it was SQL I would use IDs or page&offset things but I cannot do that with Cassandra. So I tried:

SELECT * FROM examples WHERE date<='DATEMARK' AND create_at < 'TIMEMARK' AND entities CONTAINS 'something' limit 20 ALLOW FILTERING;

But since date is the partition key, I cannot use comparison operation >, <. The past data could be created very far from now.

Can I satisfy my real-time+past requirements with Cassandra? I wonder if I have to make another DB for accessing past data.

2

2 Answers

1
votes

Yes you can, but you must change your mindset and think like NoSQL patterns, in this scenarios you can save your data in duplicate manner and save your data in other table with another partition key and cluster column that satisfies your needs.

1
votes

we have been using Cassandra extensively for showing real-time + past data. I request you not to use allow filtering option in Cassandra as it's not a good practice. Try to make your schema properly such that you need not required to jump the columns. Suppose you have a schema:

Created_date | Created_time | user_id | Country | Name | Activity

In this schema, you are considering Created_date,created_time,user_id, country as a primary key but you want the user_id of a particular country. In this case, even though you have considered Country column as a primary key you can't query like:

"SELECT * from table where Created_date='2020-02-14' and Country ='india' allow filtering ";

If your query in this pattern you will lose data in your resultset and will get errors when working with big data. Or you'll be using the allow filtering option which is not suggested. So, you need to change the structure of your schema.

Created_date | Country | City | Created_time | user_id | Name | Activity

"SELECT * from table where created_date='2020-02-14' and country='india'"; Using this structure will give you a very consistent result and you will never face any errors. Suppose you want to get all the data for the last seven days. In that case use loop and traverse through the results of each day and store it into some data structure. Hope you understand.