Dynamo DB Data model

Question

I am designing a DynamoDB table, I have following attributes:

uniqueID | TimeStamp | Type | Content | flag

I need to get sorted list of all rows based on timestamp having flag set to true.

uniqueID is system generated ID.
TimeStamp is system time while populating table.
Number of distinct Type will be less than 10.
flag: true/false

I can think of following 3 approaches:

To make uniqueID as partition key for the table, And create Global Secondary Index as flag & TimeStamp, Partition and Sort keys respectively. Now I can query Global Secondary index with hash as flag and get sorted items on TimeStamp.
But the problem here is, as value of flag will be true and false only, and no of rows having flag set to false is relatively very less compared to true, there will be only 2 partitions. This loses all scaling characteristics of DynamoDB.
One another alternative is making Type as Partition key and TimeStamp as sort key for Global Secondary Index. This is better. But while querying I can't select all types of Type as DynamoDB requires Hash key in Query parameter. So I need to query this GSI multiple times to get data for all types of Type hash key.
Scan the table (Scan Operation): Scan returns all data with flag set to true without requirement of hash key but It won't give me sorted results on creationTime.

After analyzing use case, I think approach 1 is the best for now.

Could you please suggest any other approach better that this.

Thanks in advance!

Szymon Szymon · Accepted Answer · 2015-11-15T15:11:37

Any partition key that is based on flag or TypeOfInfo will be bad as there are only few possible values (2 and 10 respectively) and the way your data goes into partitions will be skewed. You need to use something that provides a good distribution and in your case the base candidate for the partition key of the table is uniqueId.

The problem is that when you want to get the results based on flag, especially when flag is true, you will get a lot of records, possibly big majority. So scaling of DynamoDB won't give you much anyway if you need to get back most records.

You can try to create a GSI with flag as the partition key and timestamp as the range key. This is not an ideal set of keys but covers what you need. Having a good key for the table means that you can later easily switch to another solution (e.g. scanning and not using the GSI). Keep in mind that if you want to avoid querying the table when using the GSI, you will have to project those attributes you want to return into the GSI.

So summing up, I think you can choose between the GSI and scanning:

Scanning can be slower (test it) but won't require additional data storage
GSI can be faster (test it) but will require additional data storage.

Dynamo DB Data model

1 Answers