What is the most optimal way to automate data (csv file) transfer from s3 to Redshift without AWS Pipeline?

Question

I am trying to take sql data stored in a csv file in an s3 bucket and transfer the data to AWS Redshift and automate that process. Would writing etl scripts with lambda/glue be the best way to approach this problem, and if so, how do I get the script/transfer to run periodically? If not, what would be the most optimal way to pipeline data from s3 to Redshift.

Tried using AWS Pipeline but that is not available in my region. I also tried to use the AWS documentation for Lambda and Glue but don't know where to find the exact solution to the problem

can you schedule a copy command somehow? thats the simplest way. — Jon Scott

John Rotenstein John Rotenstein · Accepted Answer · 2019-07-03T23:27:55

All systems (including AWS Data Pipeline) use the Amazon Redshift COPY command to load data from Amazon S3.

Therefore, you could write an AWS Lambda function that connects to Redshift and issues the COPY command. You'll need to include a compatible library (eg psycopg2) to be able to call Redshift.

You can use Amazon CloudWatch Events to call the Lambda function on a regular schedule. Or, you could get fancy and configure Amazon S3 Events so that, when a file is dropped in an S3 bucket, it automatically triggers the Lambda function.

If you don't want to write it yourself, you could search for existing code on the web, including:

The very simply Python-based christianhxc/aws-lambda-redshift-copy: AWS Lambda function that runs the copy command into Redshift
A more fully-featured node-based A Zero-Administration Amazon Redshift Database Loader | AWS Big Data Blog

What is the most optimal way to automate data (csv file) transfer from s3 to Redshift without AWS Pipeline?

1 Answers