1
votes

Say I got some kind of forum/discussion platform, and have a dynamodb table that stores posts. The hash key would be some unique identifier, and the range key might be the timestamp the post was created at.

I now want to retrieve the 10 latest posts. However, I know that the scan operation is very expensive, and thus want to avoid using it. Is there any way for me to query the 10 latest posts WITHOUT having to perform a scan?

I'm not sure how.

1

1 Answers

1
votes

You should use an GSI with an attribute like 'yyyy-mm-dd' as the hash key, and timestamp as the range key. Since it's possible that there are less than 10 comments in a single day, you may need more than one query, but that's fairly easy to achieve.

Pseudocode:

currentDate = today
while (results.size < 10)
    results.add(queryDynamoDB(currentDate))
    currentDate.subtract(1 day)

When querying the index, the query request should have

scanIndexForward = false
limit = 10

That way it will return up to 10 results which are the most recent timestamp for a given day. You will only have to query more than one day if your forum is not very active, or a new day has just started. (Querying more than one day will not have a significant extra cost in RCU, just additional latency.)

Edit: Just to be clear, this is a GSI that I am proposing. The YYYY-MM-DD field is a field that exists just for the GSI. The main table should use some sort of unique (not time based) identifier as the hash key.