Redshift copy latest csv file from S3

Question

I have a S3 bucket stores daily log files, named "Log_YYYY_MM_DD". Also, I have a table in Redshift only stores data of the latest log. For example:

In S3, I have "Log_Date0.csv", "Log_Date1.csv", "Log_Date2.csv"
And the redshift table should only contains data in "Log_Date2.csv"

How can I setup a daily data pipeline so that redshift automatically copy the latest log contains and rewrite the table?

Thanks

John Rotenstein John Rotenstein · Accepted Answer · 2016-10-31T10:35:19

Amazon Redshift cannot "automatically copy" files. It will only load files when the SQL COPY command is issued.

Therefore, it is up to you to create the necessary commands to TRUNCATE the table and then COPY the data from Amazon S3.

You could write a cron job that uses psql v8.0.2 (to match Redshift) to run the SQL commands. Alternatively, you could use a third-party ETL product that can load the data for you.

To trigger the automatic load of information into Amazon Redshift when a new file is added to an Amazon S3 bucket, you could:

Create an AWS Lambda function that sends the COPY command to Redshift via a psql/JDBC connection
Configure the S3 bucket to trigger the Lambda function when ObjectCreated

See: A Zero-Administration Amazon Redshift Database Loader

Redshift copy latest csv file from S3

1 Answers