I'm looking at migrating a massive database to Amazon's DynamoDB (think 150 million plus records).
I'm currently storing these records in Elasticsearch.
I'm reading up on Data Pipeline and you can import into DynamoDB from S3 using a TSV, CSV or JSON file.
It seems the best way to go is a JSON file and I've found two examples of how it should be structured:
From AWS:
{"Name"ETX {"S":"Amazon DynamoDB"}STX"Category"ETX {"S":"Amazon Web Services"}} {"Name"ETX {"S":"Amazon push"}STX"Category"ETX {"S":"Amazon Web Services"}} {"Name"ETX {"S":"Amazon S3"}STX"Category"ETX {"S":"Amazon Web Services"}}
-
{"Name": {"S":"Amazon DynamoDB"},"Category": {"S":"Amazon Web Services"}} {"Name": {"S":"Amazon push"},"Category": {"S":"Amazon Web Services"}} {"Name": {"S":"Amazon S3"},"Category": {"S":"Amazon Web Services"}}
So, my questions are the following:
- Do I have to put a literal 'START of LINE (STX)'?
- How reliable is this method? Should I be concerned about failed uploads? There doesn't seem to be a way to do error handling so I do I just assume that AWS got it right?
- Is there an ideal size of file? For example should I break up the database into say 100K chunks of records and store each 100k chunk in one file?
I want to get this right the first time and not incur extra charges as apparently you get charged when you're right or wrong in your setup.
Any specific parts/links to the manual that I missed would also be greatly appreciated.