0
votes

I have a dynamodb table from which I want to delete a large no of items. I found a stackoverflow answer to a similar question where you scan the whole table to collect all the relevant items, then delete them in batch. but in my case the items are too many that they wouldnt fit in memory.

What are the possible solutions in such case?

  1. Scan the table using lastEvaluatedKey and each time delete 'x' number of items(say 25 or 100). This would only require one scan but is this a valid solution? Does deleting item(s) have any affect on the lastEvaluatedKey for the next iteration?
  2. Scan multiple times and delete 'x' number of items without using lastEvaluatedKey. This would require many full-table scans. It is definitely a valid solution but i want to avoid it.
1

1 Answers

0
votes

Option 1 should work..

You pass the LastEvaluatedKey returned to you in as ExclusiveStartKey

Note the word "Exclusive", the scan start with the next key greater than the value passed in as ExclusiveStartKey.

You could even use this option in a parallel scan, from the docs

In a parallel scan, a Scan request that includes ExclusiveStartKey must specify the same segment whose previous Scan returned the corresponding value of LastEvaluatedKey.

Going forward, consider the setting up a Time-To-Live (TTL)

This will allow DDB to automatically delete items for you. The best part, is those deletes cost you nothing! Unlike your current plan for which you will pay to read the item and again to delete it.