0
votes

I want to improve my current application. I am using redis using ElastiCache in AWS in order to store some user data from my website.

This solution is not scalable and I want to scale it using Amazon Kinesis Data Firehose for the autoscale streaming output, AWS Lambda to modify my input data, store it in S3 bucket and access it using AWS Athena.

I have been googling for several days but I really don't know how Amazon Kinesis Data Firehose store the data in S3.

Is Firehose going to store the data as a single file per each process that it will process or there is a way to add this data in the same csv or group the data in different csv's?

1

1 Answers

1
votes

Amazon Kinesis Data Firehose will group data into a file based on:

  • Size of data (eg 5MB)
  • Duration (eg every 5 minutes)

Whichever one hits the limit first will trigger the data storage in Amazon S3.

Therefore, if you need near-realtime reporting, go for a short duration. Otherwise, go for larger files.

Once a file is written in Amazon S3, it is immutable and Kinesis will not modify its contents. (No appending or modification of objects.)