What you are looking for are DynamoDB Streams (official docs). This can seamlessly flow into the Kinesis firehose as you have correctly pointed out.
This is the most optimal way and provides the best balance between the cost, operational overhead and the functionality itself. Allow me explain how:
- DynamoDB streams: Streams are triggered when any activity happens on the database. This means that unlike a process that will scan the data on a periodic basis and consume the read capacity even if there is no update, you will be notified of the new data.
- Kinesis Firehose: You can configure the Firehose to batch data either by the size of the data or time. This means that if you have a good inflow, you can set the stream to batch the records received in each 2 minutes interval and then issue just one
COPY
command to redshift. Same goes for the size of the data in the stream buffer. Read more about it here.
The ideal way to load data into Redshift is via COPY
command and Kinesis Firehose does just that. You can also configure it to automatically create backup of the data into S3.
Remember that a reactive or push based system is almost always more performant and less costly than a reactive or push based system. You save on the compute capacity needed to run a cron process and also continuously scan for updates.