I'm new to AWS, and I'm working on archiving data from DynamoDB to S3. This is my solution and I have done the pipeline.
DynamoDB -> DynamoDB TTL + DynamoDB Stream -> Lambda -> Kinesis Firehose -> S3
But I found that the files in S3 has different number of json objects. Some files has 7 json objects, some has 6 or 4 objects. I have done ETL in lambda, the S3 only saves REMOVE
item, and the item has been unmarshall
.
I thought it would be a json object in a file, since the TTL value is different for each item, and the lambda would deliver the item immediately when the item is deleted by TTL.
Does it because the Kinesis Firehose batches the items? (It would wait for sometime after collecting more items then saving them to a file) Or there's other reason? Could I estimate how many files it will save if I have a new item is deleted every 5 minutes?
Thank you in advance.