1
votes

The AWS docs to import data from S3 into a Dynamo DB table using Data Pipeline (https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part1.html) references an S3 file (s3://elasticmapreduce/samples/Store/ProductCatalog) which is in this format:

enter image description here

https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-pipelinejson-verifydata2.html?_sm_ovs=2DtvnqvHTVHW7q50vnqJqRQFVVnqZvnqMVVVVVVsV

Question is... how do I get a CSV of say 4 millions rows into this format in the first place? Is there a utlity for that?

Thanks for any suggestions... I've had a good google and haven't turned up anything.

2
Perhaps the intent is always to export data first from Dynamo to S3 (back it up) and then you can always import that back up... and thus you've got the file in the right format.... But not so much an initial import into Dynamo workflow which I'm trying to achieve.Andrew Duffy
I did this once via a custom pipeline job. Not posting as an answer as I don't have the link or a copy of what I exactly used. It was something like this though: github.com/awslabs/data-pipeline-samples/blob/master/samples/…stevepkr84

2 Answers

1
votes

steveprk84 already linked to this in his response, but I wanted to call it out: https://github.com/awslabs/data-pipeline-samples/tree/master/samples/DynamoDBImportCSV

Hive on EMR supports DynamoDB as an external table type. This sample uses a HiveActivity to create external Hive tables pointing to the target Dynamo table and the source CSV, and then it executes a Hive query to copy the data from one to the other.

-4
votes

AWS Datapipeline service supports CSV Import to dynamo db. You can create a pipeline from the aws console for datapipeline and choose "Import DynamoDB backup data from S3." to import CSV stored in S3 to Dynamodb.

See also

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBPipeline.html#DataPipelineExportImport.Importing