1
votes

I have a dynamodb table has a partition key as ID and sort key timestamp. I need to query the table to find the latest 100 rows based on timestamp column. When I check the query API, I must specify the KeyConditions or KeyConditionExpression. I don't want to query partition key in this case. How can I do that?

The scan operation doesn't seem support sort the result.

One solution I can think of is to add an attribute on the table, let say status and assign the same value for every rows in this table. Then create a GSI on the table with the partition key as status and timestamp as sort key. Then I can query the table with partition key is equal to OK and sort the result based on timestamp. However, this solution doesn't work well in terms of scale. The GSI will not be scaled well since every row in the table has the same value for status.

So what is the best solution to my problem? Shouldn't I consider using dynamodb?

2

2 Answers

2
votes

Your approach is valid. And if you keep the dummy value in your GSI's partition key small (i.e. one byte) it won't have much of an impact performance-wise. Regarding scaling issues with all items having the same partition key, DynamoDB splits partitions by sort key if the collection size grows bigger than 10 GB.

Please also have a look at this StackOverflow question that is similar to yours.

2
votes

Sorted elements are sorted within a partition. You would need to have all results on the same partition.

But obviously you don't want just one partition, you are back to an SQL database. The uncomfortable and uneasy way this is done in DynamoDB is using Streams. When you have new elements or updates, you can check if those elements are in the top N positions. If they are then replace that value, for example, say you have people with money:

PK            Attributes:  
#Entity#21    name=Fred.      money=5,000     
#Entity#22    name=Bob.       money=10,000     
#Entity#23    name=Smith.     money=1,000     
...

Then we can keep track of the top 10 richest people:

PK              SORT               Attributes:      
#Money#Highest    1               id=#Entity#22    value=10,000
#Money#Highest    2               id=#Entity#102   value=9,000
...

Then when you want the richest people, you do a query with PK=#Money#Highest. You might also copy more attributes in depending on your query. This is pretty much the go, if you want to calculate the top something across partitions you setup streams and do it yourself. Note that though these totals will be out of date by some seconds depending on your stream settings. You stream Lambda would be something like:

const handler = (event, context, callback) => {
    event.Records.forEach((ev, i) => {
        if (ev.eventName === "INSERT" || ev.eventName === "UPDATE" || ) {
          // TODO
        }
      }
    }

Pretty annoying I know! But this is the weird way this stuff is implemented. But it is extremely quick as you are only ever retrieving pre-calculated values. And that is the whole way which you work with Dynamo, Storage is cheap, compute expensive, optimize compute, and duplicate data as you need because hey it's cheap anyway.