I'm pretty new to AWS, and I'm trying to find a way to reliably transfer data from a Kinesis stream to an AWS RDS postgres database table. The records will need to undergo small transformations on the way in, like filter (not all records will be inserted, depending on a key), and parsed for an insert into postgres. Currently, the data from the Kinesis stream is being dumped by firehose into S3 buckets as parquet.
I'm a bit lost in the many possible ways there seems to be of doing this, like maybe:
- Kinesis streams -> Firehose -> Lambda -> RDS
- Kinesis streams -> Firehose -> S3 -> Data Pipeline ETL job -> RDS
- Database migration for S3 -> RDS?
- AWS Glue?
- others...?
In a non serverless world, I would run a chron job every, say, one hour which would take the files in the most recent S3 bucket partition (which is year/month/day/hour), so the latest hour, and filter out the records not needed in RDS, and do a bulk insert the rest into the RDS. I don't want to have a EC2 instance that sits idle 95% of the time to do this. Any advice?