4
votes

I am rather new to DynamoDB and currently we are thinking about migrating an existing project to a serverless application using DynamoDB where we want to adapt the following setup from a RDMS database:

Tables:

  • Projects (ProjectID)
  • Files (FileID, ProjectID, Filename)
  • Tags (FileID, Tag)

We want to make a query with DynamoDB to fetch all Files for a specific Project (by ProjectID) with one or multiple Tags (by Tag). In an RDMS this query would be simple with something like:

SELECT * FROM Files JOIN Tags ON Tags.FileID = Files.FileID WHERE Files.ProjectID = ?PROJECT AND Tags.Tag = ?TAG_1 OR ?TAG_2 ...

At the moment, we have the following DynamoDB setup (but it can still be changed):

  • Projects (ProjectID [HashKey], ...)
  • Files (ProjectID [HashKey], FileID [RangeKey], ...)

Please also consider that the number of project entries is huge (between 1000 - 30000) and also the number of files for each project (is between 50 and 100.000) and the query should be really fast.

How can this be achieved using DynamoDB-query, best without using filter expressions since they are applied after data selection? It would be perfect if the table Files could have a StringSet Tags as column but I guess that this cannot be used for an efficient DynamoDB-query (so without using DynamoDB-scan) since DynamoDB-indices can only be of type String, Binary and Number and not of type StringSet? Is this maybe an applicable use case for the Global Secondary Index (GSI)?

2

2 Answers

4
votes

A bit late, just saw this question referenced from another one.

I guess you've went and solved it something like this?

DynamoDB tables

  • Projects (ProjectID [HashKey], ...)
  • Files (ProjectID [HashKey], FileID [RangeKey], ...)
  • Tags (Tag [HashKey], FileID [RangeKey], ProjectID [LSI Sort Key])

On the FileTags, you need the FileID to make the primary key unique, but you can add the ProjectID as a sort key for a Local Secondary Index, so you can search on Tag + ProjectID.

It's some sort of Data Denormalization, but that's what it takes to go NoSQL :-( . E.g. if your File would be switched to another Project, you'll need to update the ProjectID not only on the File, but also on all the Tags.

3
votes

The question is almost three years old, but it's still in the Google result. So if anybody else lands here, maybe the following page from DynamoDB docs can help. Just found it myself and didn't try it yet, but it looks promising. Seems to be newer than the other replies here and shows a nice approach to solve the problem.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-adjacency-graphs.html