Context
I am building a forum and investigating modeling the data with DynamoDB and adjacency lists. Some top-level entities (like users) might have multiple types of relationships with other top-level entities (like comments).
Requirements
For example, let's say we want be able to do the following:
- Users can like comments
- Users can follow comments
- Comments can display users that like it
- Comments can display users that follow it
- User profiles can show comments they like
- User profiles can show comments they follow
So, we essentially have a many-to-many (user <=> comment) to many (like or follow).
Note: This example is deliberately stripped down, and in practice there will be many more relationships to model, so i'm trying to think of something extensible here.
Baseline
The following top-level data would likely be common in any adjacency list representation:
First_id(Partition key) Second_id(Sort Key) Data
------------- ---------- ------
User-Harry User-Harry User data
User-Ron User-Ron User data
User-Hermione User-Hermione User data
Comment-A Comment-A Comment data
Comment-B Comment-B Comment data
Comment-C Comment-C Comment data
Furthermore, for each table below, there would be an equivalent Global Secondary Index with the partition and sort keys swapped.
Example Data
This is what I would like to model in DynamoDB:
- Harry likes comment A
- Harry likes comment B
- Harry follows comment A
- Ron likes comment B
- Hermione likes comment C
Option 1
Use a third attribute to define the type of relationship:
First_id(Partition key) Second_id(Sort Key) Data
------------- ---------- ------
Comment-A User-Harry "LIKES"
Comment-B User-Harry "LIKES"
Comment-A User-Harry "FOLLOWS"
Comment-B User-Ron "LIKES"
Comment-C User-Hermione "FOLLOWS"
The downside to this approach is that there is redundant information in query results, because they will return extra items you maybe don't care about. For example, if you want to query all the users that like a given comment, you're also going to have to process all the users that follow a that given comment. Likewise, if you want to query all the comments that a user likes, you need to process all the comments that a user follows.
Option 2
Modify the keys to represent the relationship:
First_id(Partition key) Second_id(Sort Key)
------------- ----------
LikeComment-A LikeUser-Harry
LikeComment-B LikeUser-Harry
FollowComment-A FollowUser-Harry
LikeComment-B LikeUser-Ron
FollowComment-C FollowUser-Hermione
This makes it efficient to query independently:
- Comment likes
- Comment follows
- User likes
- User follows
The downside is that the same top-level entity now has multiple keys, which might make things complex as more relationships are added.
Option 3
Skip adjacency lists altogether and use separate tables, maybe one for Users
, one for Likes
, and one for Follows
.
Option 4
Traditional relational database. While I'm not planning on going this route because this is a personal project and I want to explore DynamoDB, if this is the right way to think about things, I'd love to hear why.
Conclusion
Thanks for reading this far! If there is anything I can do to simplify the question or clarify anything, please let me know :)
I've looked at the AWS best practices and this many-to-many SO post and neither appears to address the many-to-many (with many) relationship, so any resources or guidance greatly appreciated.