0
votes

I have a dynamodb table, that I need to read hourly to execute a load on datawarehouse. I am thinkin in two options:

  1. Create a job that scan table filtering by last_update timestamp and load to DW.
  2. Using a Kinesis + Lambda to every new update/insert I dump the records to S3, for further processing. Then I create a hourly job that load the s3 file to DW.

What is the best approach?

2

2 Answers

1
votes

For number 1, your way is pretty okay. I need more specifications in order to improve.

For number 2, you should use Kinesis Firehose and configure it to save data directly to S3.

1
votes

Using DynamoDB Updates Stream is an efficient way to connect between your front end serving database (DynamoDB) and your analytical database. It allows decoupling that gives independence to each part of your system to evolve without ruining the other part.

It also allows you to have a serverless environment that reduces your operational load and efforts. You can with a single Lambda function that is reading the update stream to decide which data will be logged to which datastore. Usually people are writing all the inserts/updates/deletes to S3 as a log, and the relevant records and columns to the various tables in the data warehouse (for example, Redshift). Your Lambda function can put the transformed records to a couple of Kinesis Firehose that will move the data reliably to their destinations.