0
votes

I am trying to model my data. As you see the partition key is the user email. In the global secondary index I have a PK of "US", which stands for "User". If I want to get all of the enabled users I just have to query the GSI where GSI1PK = "US" and GSI1SK Starts with "Enabled".

My concern is that all of the users in the app would have the same GSI1PK. Will this be a problem? Can GSIs PK have problems with "hot partitions"? I am Googling this and I do not see a clear answer. There is only one here on StackOverflow that says it will be a problem, but there are other places that say it will not. I am kind of confused.

What would be the best way to structure the data in my table so I can access all of the users without causing hot artition issues?

DynamoDB Table

1

1 Answers

2
votes

Placing a potentially large item collection in a single partition will likely lead to a hot partition. Ideally, your chosen partition keys evenly distribute data across partitions. However, it may not always be clear about how to achieve this.

You might consider splitting your large partition into smaller partitions on write (aka write sharding), and re-combining them when reading. For example, when creating GSIPK, you could introduce a randomly generated integer between 1 and 4 in the partition key:

enter image description here

And your GSI would look like this

enter image description here

Now your User data is more evenly distributed across partitions. When reading users from your table, you would pull from all the partitions at once. This could be done in parallel for faster performance.

In this example, I chose a random number to "write shard" the data into separate partitions. However, your data may lend itself to a more natural division (e.g. by country, enabled status, time zone, etc). What I want to highlight is that your strategy to distribute data across partitions can be separate from the data model you use to support your application access patterns.