I've been using AWS DMS to perform ongoing replication from MySql Aurora to Redshift. However, the ongoing replication is causing constant 25-30% CPU load on the target. This is because it produces many small files on S3 and loads/processes them non-stop. Redshift is not really designed for handling large number of small tasks.
In order to optimize, i've made it so that the process starts at the beginning of each hour, waits till the target is in-sync, and then stops. So, instead of working continually, it works for 5-8 minutes at the beginning of each hour. Even so, it is still very slow and unoptimized because it still has to process hundreds of small s3 files, only in shorter timespan.
Can this be optimized further? Is there a way to tell DMS to buffer these changes for larger period of time, and not produce fewer larger instead of many small s3 files? We really don't mind having higher target latency.
The amount of data transferred between Aurora and Redshift is rather small. There are around ~20K changes per hour, and we're using 4-node dc1.large redshift cluster. It should be able to handle those 20K changes in matter of seconds, not minutes