I am rather new to DynamoDB and currently we are thinking about migrating an existing project to a serverless application using DynamoDB where we want to adapt the following setup from a RDMS database:
Tables:
- Projects (ProjectID)
- Files (FileID, ProjectID, Filename)
- Tags (FileID, Tag)
We want to make a query with DynamoDB to fetch all Files for a specific Project (by ProjectID) with one or multiple Tags (by Tag). In an RDMS this query would be simple with something like:
SELECT * FROM Files JOIN Tags ON Tags.FileID = Files.FileID WHERE Files.ProjectID = ?PROJECT AND Tags.Tag = ?TAG_1 OR ?TAG_2 ...
At the moment, we have the following DynamoDB setup (but it can still be changed):
- Projects (ProjectID [HashKey], ...)
- Files (ProjectID [HashKey], FileID [RangeKey], ...)
Please also consider that the number of project entries is huge (between 1000 - 30000) and also the number of files for each project (is between 50 and 100.000) and the query should be really fast.
How can this be achieved using DynamoDB-query, best without using filter expressions since they are applied after data selection? It would be perfect if the table Files could have a StringSet Tags as column but I guess that this cannot be used for an efficient DynamoDB-query (so without using DynamoDB-scan) since DynamoDB-indices can only be of type String, Binary and Number and not of type StringSet? Is this maybe an applicable use case for the Global Secondary Index (GSI)?