0
votes

I have this flow in which we have to persist in DynamoDb some items for a specific time. After the items has expired, we have to call some other services, to notify them that data got expired.

I was thinking about two solutions:

1) Move expiry check to Java logic: Retrieve DynamoDb data in batches, verify the expiry items in Java, and after that delete the data in batches, and notify other services.

There are some limitations:

BatchGetItem let you retrieve max 100 items. BatchWriteItem let you delete max 25 items

2) Move expiry check to the db logic: Query the DynamoDb, in order to check which items has expired(and delete them), and return the id's to the client, in order for us to notify other services.

Again, there are some limitations:

The result set from a Query is limited to 1 MB per call.

For both solutions, there will be a job, that will be run periodically, or we're going to use some aws lambda that will be triggered periodically and will call an endpoint from our app that is going to delete the item from db and notify other services.

My question is if DynamoDb is proper for my case, or should I use some relational db that doesn't have these kind of limitations like Mysql? What do you think ? Thanks!

1
Beyond what other people answered, please note that if you want an application to read all the data (to check for expired data), BatchGetItem is the wrong API to use. Instead, you should use Scan, which allows you to page through the entire data or just specific attributes of each item. You can also use Scan in parallel from many nodes.Nadav Har'El
Ok, so you're saying that I should use Scan with a filter of expiration. The only disadvantage that I saw is that scan is limited to 1MB of data retrieval, before the filter expression being applied.brebDev

1 Answers

3
votes

Have you considered using the DynamoDB TTL feature? This allows you to create a time-based column in your table that DynamoDB will use to automatically delete the items based on the time value.

This requires no implementation on your part and no polling, querying, or batching limitations. You will need to populate a TTL column but you may already have that information present if you are rolling your own expiration logic.

If other services need to be notified when a TTL event occurs, you can create a Lambda that processes a DynamoDB stream and take action when a TTL delete event occurs.