I am implementing web application similar to Twitter. I need to implement 'retweet' action, and one tweet can by retweeted by one person multiple times.
I have a basic 'tweets' table that have columns for:
Tweets: tweet_id | tweet_text | tweet_date_created | tweet_user_id
(where tweet_id
is primary key for tweets, tweet_text
contains tweet text, tweet_date_created
is the DateTime when tweet was created and tweet_user_id
is the foreign key to users
table and identifies user who has created the tweet)
Now I am wondering how should I implement the retweet action in my database.
Option 1
Should I create new join table, which would look like this:
Retweets: tweet_id | user_id | retweet_date_retweeted
(Where tweet_id
is a foreign key to tweets
table, user_id
is a foreign key to users
table and identifies user who has retweeted the tweet, retweet_date_retweeted
is a DateTime which specifies when the retweet was done.)
pros: There will be no empty columns, when user process reteet, new line in retweets
table will be created.
cons: Querying process will be more difficult, it will need to join two tables and somehow sort the tweets by two dates (when tweet is not retweet, sort it by tweet_date_created, when tweet is retweet, sort it by retweet_date_retweeted).
Option 2
Or should I implement it in the tweets
table as parent_id
, it will then look like this:
Tweets: tweet_id | tweet_text | tweet_date_created | tweet_user_id | parent_id
(Where all the columns remains the same and parent_id
is a foreign key to the same tweets
table. When tweet is created, parent_id
remains empty. When tweet is retweeted, parent_id
contains origin tweet id, tweet_user_id
contains user which processed the retweet action, tweet_date_created
contains the DateTime when retweet was done, and tweet_text
remains empty - becouse we will not let users change the original tweet when retweeting.)
pros: Querying process is much more elegant, as I do not have to join two tables.
cons: There will be empty cells every time tweet is retweeted. So if I have 1 000 tweets in my database and every of them is retweeted for 5 times, there will be 5 000 lines in my tweets
table.
Which is the most efficient way? Is it better to have empty cells or to have querying process more clean?