0
votes

I'm looking for a solution that can daily ingest 6 to 8 tables in to RDS. The tables in question have specific key relations, so this should be incorporated in the database.

Currently I'm having a hard time finding an optimal solution to load the data for those 6-8 tables in RDS programatically. Which kind of services are currently optimal for doing that?

Lambda

Data is slightly too big for Lambda's memory footprint.

Datapipeline

Not clear how this would work with serverless Aurora and this also requires a scheduled ec2 instances (breaks with the serverless pattern).

Load S3 Data into Amazon RDS MySQL Table - AWS Data Pipeline

Glue?

Glue seems to be more tailored towards Redshift.

So I'm a bit lost on what the best solution design would be for this. Help would be appreciated.

2

2 Answers

0
votes

You should try AWS Date Pipeline. Briefly,these are the steps:

  • Create Role and Attach S3 Bucket Policy
  • Setup Cluster Parameter Group
  • Edit Parameter Groups to use Role
  • Reboot Aurora instance

This, Loading Data into an Amazon Aurora MySQL, is for MySQL.

Loading data with PostgreSQL should be very similar.

0
votes

You can surely use AWS Glue. It's true that Glue has some bias towards Redshift, but it offers both Pyspark and Python jobs which can be used to do almost anything. Think of it as a Lambda with no 15-minute time constraints and write away in python whatever data moving logic you want.

Since aurora serverless is hosted in a VPC, you might have to make some VPC gateway endpoints to access certain things once you host your glue job in the same VPC, but that's just a one time thing.

I actually had quite a similar usecase and used Glue to load data programmatically from S3 to Aurora Serverless (MySQL) :

  • Created a JDBC connection to Aurora serveless from the Glue console
  • Wrote a Python glue job to read data from S3 and load it into the Aurora DB using pymysql. (This used the connection made in step 1 which meant that the job would get hosted in the same VPC as the database. So, to access S3, had to make a Gateway endpoint)
  • Created a workflow with this job by adding a start trigger and failure extensions
  • Wrote a S3->SQS trigger and a Lambda which would fire the glue workflow whenever there are new messages in the queue.

You can refer this post for more details.