I'm currently trying to find a (good) solution on how to synchronise data from an external MySql database which is completely separated from anything AWS into an AWS DynamoDb.
The sync. process should run every day at around 12:00PM and should grab the latest inserted item from DynamoDb containing a create date to make sure we only get MySql data from that given date/time when the sync. runs. The sync. would typically transfer around 110.000 records every day.
One thing to note: we're using .NET where I work.
From what I've understood, there are a few AWS services that can help me do this:
ERM (Link)
AWS ERM seems like the way to go, but it seems that Hive scripts are not able to communicate with external MySql databases? Or am I wrong here? I find it hard to find any usable Hive script examples.
Data Pipeline (Data Pipeline)
From what I've understood, Data Pipeline is best used when your db schema is the exact same in both ends, which is not the case here, since we're reading from a MySql database into a DynamoDb. The structure is not completely 1:1.
Third option would be to create a Windows Service which runs a piece of C# code to read data from MySql and store it in DynamoDb. Only thing I'm worried about here, is performance :-) Looping through 100.000+ records processing them and then store each in DynamoDb doesn't seem appealing to me.
Does anyone have any experience with this they'd like to share? :-) Concrete examples would be very welcome. Also, if I've missed any kind of service/other way of implementing this, please let me know.