In your case, you would stream the dynamodb to redshift
DynamoDB --> DynamoDBStreams --> Lambda Function --> Kinesis Firehose --> Redshift.
First, you need a lambda function handle the DynamoDBStream. For each DynamoDBStream event, use firehose PutRecord
API to send the data to firehose. From the example
var firehose = new AWS.Firehose();
firehose.putRecord({
DeliveryStreamName: 'STRING_VALUE', /* required */
Record: { /* required */
Data: new Buffer('...') || 'STRING_VALUE' /* Strings will be Base-64 encoded on your behalf */ /* required */
}
}, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data); // successful response
});
Next, we have to know how the data being insert into the RedShift. From the firehose document,
For data delivery to Amazon Redshift, Kinesis Firehose first delivers
incoming data to your S3 bucket in the format described earlier.
Kinesis Firehose then issues an Amazon Redshift COPY command to load
the data from your S3 bucket to your Amazon Redshift cluster.
So, we should know what data format to let the COPY
command map the data into RedShift schema. We have to follow the data format requirement for redshift COPY command.
By default, the COPY command expects the source data to be
character-delimited UTF-8 text. The default delimiter is a pipe
character ( | ).
So, you could program the lambda which input dynamodb stream event, transform it to pipe (|) separated line record, and write it to firehose.
var firehose = new AWS.Firehose();
firehose.putRecord({
DeliveryStreamName: 'YOUR_FIREHOSE_NAME',
Record: { /* required */
Data: "RED_SHIFT_COLUMN_1_DATA|RED_SHIFT_COLUMN_2_DATA\n"
}
}, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data); // successful response
});
remember to add \n
as the firehose will not append new line for you.