0
votes

I'm aware of the standard COPY from DynamoDB to Redshift, but that only works for schemas without Maps and Lists. I have several ddb tables with maps and lists and I need to use jsonpaths to do the import to Redshift.

So my question is, can I schedule a backup from DynamoDB to S3, then when the backup is complete, run the import to Redshift including the jsonpaths config? I imagine this is a two-phase process. Or can I create a single Data Pipeline that does the backup and the import?

Alternatively, is there a task-runner I can use AWS or would I need to hook up an event (SNS) to notify the import that the backup is complete?

2
Data pipeline.. – sandeep rawat
Yes, but how can you combine the execution of a backup and an import? – David Cornelson

2 Answers

0
votes

AWS now has a few services that can run tasks. You could manage your import workflow using AWS step functions. AWS Lambda functions corresponding to each step in your import workflow could spawn AWS Batch jobs, where the first job would backup your DynamoDB table to S3, and the second job would import to Redshift using the jsonpaths config.

0
votes

You can do a Dynamo to RS copy but the schemas of both have to match exactly AFAIK(I have'nt tried this yet)

However you can set up two/single pipeline to setup a backup from DDB to S3 and from S3 to RS. DDB takes backup as JSON objects so you will need jsonpath config to insert into your RS

Example col1 (number) - 0 col2 (String) - x your backup would look like { "col1":{"n":"0"},"col2":{"s":"x"}} your jsonpath to get 0 should be like $.col1.n

You can use Data pipelines predefined templates if you setup 2 pipelines, but you have to build your own or start with a template and build on it if you want to use one pipeline

You can hook up an snsAlarm on Failure or Success of the pipeline.