4
votes

I want to get the all items of the last 24 hours. I've already done some google searches and it seems that it isn't easy to do with DynamoDB.

I was thinking if it is possible to create an secondary index with a common hash key and a timestamp field as sort key. Therefore, I could query on the timestamp with condition timestamp > (DateTime.Now - 24h). Can somebody comment on this if it is a possible way or has another idea?

Actually, I only need items in my table that are not older than 24 hours. So another idea would be to clean the table every hour and delete all items older than 24 hours. Is this possible?


EDIT: Another idea would be to create a secondary index with date as hash key and time as sort key, and then execute two queries. The first for date = 'today' and the second for date = 'yesterday' with condition on time. But how would I store the date and time, as string or integer? Would this be better than my idea above?

1

1 Answers

2
votes

Your ideas with indexes are generally in the right direction. You're also right that there is no way with Dynamo to order items retrieved by a scan (which would be required in order to obtain the items you want in the absence of an index).

So on to the options:

  1. you could, as you suggested, create a GSI with a partition key that is let's say the date value, and a sort key that's the time stamp - then with two queries you can always get the items of the most recent 24 hours (you can also have hourly values in the partition key and then make 24 queries instead of 2)

  2. another option which might be even better then GSIs would be to rotate your table every N hours (where N might be 12 hours, or 24 hours or some other value that makes sense based on the volume of data you have). This solution offers you a nice way to trim old data and optimize for uneven access patterns. The older tables will probably need very low write capacity and in some cases you might even be able to do with low read capacity as well. This method does require awareness of the multiple tables when reading and writing data but depending on the volume of data you're working with it could really be advantageous to consider.