Think of using posts_by_user as metadata for a post's information. This would allow you to house user_id, post_id, message_text, etc, but you would abstract the view_count, likes_count, and comments_count into a counter table. This would allow you to fetch either a post's metadata or counters as long as you had the post_id, but you would only have to update the counter_record once.
DSE Counter Documentation:
https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html
However,
The article below is a really good starting point in relation to data modeling for Cassandra. Namely, there are a few things to take into consideration when answering this question, many of which will depend on the internals of your system and how your queries are structured.
The first two rules are stated as:
Rule 1: Spread Data Evenly Around the Cluster
Rule 2: Minimize the Number of Partitions Read
Taking a moment to consider the "user_timeline" table.
user_id and created_on as a COMPOUND KEY* - This would be ideal if
- You wanted to query for posts by a certain user and with the assumption that you would have a decent number of users. This would
distribute records evenly, and your queries would only be hitting a
partition at a time.
user_id and a hash_prefix as a COMPOUND KEY* - This would be ideal
if
- You had a small number of users with a large number of posts, which would allow your data to be evenly spread across the
cluster. However you run the risk of having to query across
multiple partitions.
follower_id and created_on as a COMPOUND KEY* - This would be ideal
if
- You wanted to query for posts being followed by a certain follower. The records would be distributed and you would minimize
queries across partitions
These were 3 examples for 1 table, and the point I wanted to convey is to design your tables around the queries you want to execute. Also don't be afraid to duplicate your data across multiple tables that are setup to handle various queries, this is the way Cassandra was meant to be modeled. Take a bit to read the article below and watch the DataStax Academy Data Modeling Course, to familiarize yourself with the nuances. I also included an example schema below to cover the basic counter schema I was pointing out earlier.
* The reason for the compound key is due to the fact that your PRIMARY KEY has to be unique, otherwise an INSERT with an existing PRIMARY KEY will become an UPDATE.
http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling
https://academy.datastax.com/courses
CREATE TABLE IF NOT EXISTS social_media.posts_by_user (
user_id uuid,
post_id uuid,
message_text text,
created_on timestamp,
deleted boolean,
user_full_name text,
PRIMARY KEY ((user_id, created_on))
);
CREATE TABLE IF NOT EXISTS social_media.user_timeline (
follower_id uuid,
post_id uuid,
user_id uuid,
location_name text,
user_full_name text,
created_on timestamp,
PRIMARY KEY ((user_id, created_on))
);
CREATE TABLE IF NOT EXISTS social_media.post_counts (
likes_count counter,
view_count counter,
comments_count counter,
post_id uuid,
PRIMARY KEY (post_id)
);