2
votes

Current Setup:

  • SQL Server OLTP database
  • AWS Redshift OLAP database updated from OLTP via SSIS every 20 minutes

Our customers only have access to the OLAP Db

Requirement:

One customer requires some additional tables to be created and populated to a schedule which can be done by aggregating the data already in AWS Redshift.

Challenge:

This is only for one customer so I cannot leverage the core process for populating AWS; the process must be independent and is to be handed over to the customer who do not use SSIS and don't wish to start. I was considering using Data Pipeline but this is not yet available in the market in which the customer resides.

Question:

What is my alternative? I am aware of numerous partners who offer ETL like solutions but this seems over the top, ultimately all I want to do is execute a series of SQL statements on a schedule with some form of error handling/ alert. Preference of both customer and management is to not use a bespoke app to do this, hence the intended use of Data Pipeline.

2
I haven't used service Data Pipeline yet, but I think in your case you can simply use the Lambda service in order to manipulate data within AWS Redshift.Aleksandr Fedorenko
Lambda is also unavailable in China until some time next year. I'm looking for an alternative in the meantime. Thanks for the suggestion though.Simon1979

2 Answers

1
votes

For exporting data from AWS Redshift to another data source using datapipeline you can follow a template similar to https://github.com/awslabs/data-pipeline-samples/tree/master/samples/RedshiftToRDS using which data can be transferred from Redshift to RDS. But instead of using RDSDatabase as the sink you could add a JdbcDatabase (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-jdbcdatabase.html). The template https://github.com/awslabs/data-pipeline-samples/blob/master/samples/oracle-backup/definition.json provides more details on how to use the JdbcDatabase.

There are many such templates available in https://github.com/awslabs/data-pipeline-samples/tree/master/samples to use as a reference.

1
votes

I do exactly the same thing as you, but I use lambda service to perform my ETL. One drawback of lambda service is, it can run max of 5 mins (Initially 1 min) only.

So for ETL's greater than 5 minutes, I am planning to set up PHP server in AWS and with SQL injection I can run my queries, scheduled at any time with help of cron function.