2
votes

I want to archive data from DynamoDB to S3. I came across two solutions:

  1. DynamoDB -> TTL DynamoDB + DynamoDB Stream -> Lambda -> Kinesis Firehose -> S3
  2. DynamoDB -> TTL DynamoDB + DynamoDB Stream -> Lambda -> S3

Which option is better and why? What are the advantages of using Kinesis Firehose and the disadvantages of not using Kinesis Firehose?

1
Firehose can batch your records into fewer bigger files.luk2302
Have you considered using the Export to S3 option?Jason Wadsworth
I want to move the data automatically when it is marked for deletion by TTL. I want to know what is the best approach for this caseReyan Chougle
You can't tell if something is deleted because of TTL vs just deleted. At best you can guess it was TTL if the TTL is in the past, but because of the delay on the TTL (up to 48 hours, I believe) you could get a delete that wasn't from the TTL even if the TTL time has passed. If you just want to get deletes in general, the stream processing will work.Jason Wadsworth
This is not correct. You can absolutely tell when an item has been deleted by TTL. You need to look for the userIdentity.type value. Take a look here. docs.aws.amazon.com/amazondynamodb/latest/developerguide/…Kirk

1 Answers

0
votes

Using Firehose gives you some more options to configure how the data gets into S3. For example, since S3 bills on a per API call, you could consolidate 10 item deletes from the DynamoDB Stream into 1 write to S3 thus saving money. You can also do minor transformation with Firehose, if that is something you need.

Also, when writing your lambda function, make sure you grab just the TTL deletes filter on the value of type being Service. If you do this, you'll only get the TTL deletes, not normal deletes as normal deletes have a user account attached to them, TTL do not, thus the Service value.