1
votes

I have DynamoDB table with the following structure:

  • SN (String - Primary Partition Key)
  • Label (String)
  • Product (String)
  • Tags (StringSet)

Here is some sample data:

SN          Label                        Product     Tags
XXXX-123    Acme Cheese System One       Z100        {"Acme Cheese", "Z100"}
XXXX-456    Acme Cheese System Two       X22         {"Acme Cheese", "X22"}
XXXX-789    Bob's Burritios System One   Z100        {"Bob's Burritios", "Z100"}

I need to be able to query against specific tags and return all items that contain the specified tag. For example, I might write a query to find all items tagged "Acme Cheese" or "Z100". I would want to retrieve all items that contained this tag.

I cannot use a Global Secondary Index, as DynamoDB does not support indexing StringSet fields. According to this question: DynamoDB - Global Secondary Index on set items, it is recommend to structure my data using a one-to-many table. I have read the documentation here DynamoDB Docs: OneToMany, but it still was not clear to me on how to create such a structure.

My question is: How do I create such a table? What would my table schema and keys look like?

1

1 Answers

1
votes

I had a similar problem and I solved this by separating the main table into multiple tables.

Following would be the schema

SN_Master:

SN - Hash
Label
Product

SN_Details:

TagName - Hash 
SN - Range

So now whenever you insert any row you have to insert in sn_master with main information and tag related information in sn_details

So basically now your data will look like:

SN          Label                        Product    
XXXX-123    Acme Cheese System One       Z100       
XXXX-456    Acme Cheese System Two       X22         
XXXX-789    Bob's Burritios System One   Z100       


TagName        SN
Acme Cheese    xxxx-123
Z100           xxxx-123
Acme Cheese    xxxx-456

Now you can query sn_details table to retrieve all the products of particular tags

CONS: If you have billions of record one particular tag then your performance will be affected as the Hash is not distributed properly.