0
votes

We have one lambda that will update dynamodb table after some operation.

Now we want to export whole dynamodb table into a s3 bucket with a csv format.

Any efficient way to do this.

Also I have found the below way of streaming directly from dynamodb to s3

https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon-s3/

But in above it will store in json format. and can not find a way to do this efficiently for 10GB data

1
I don't think this is supported natively, because CSV doesn't lend itself to the kind of hierarchical data structures supported by DynamoDB. If you want to roll your own, you can take advantage of parallel scan, but it may be easier to transform the export of a point in time in S3.Maurice

1 Answers

1
votes

As far as I can tell you have three "simple" options.

Option #1: Program that does a Scan

It is fairly simple to write a program that does a (parallel) scan of your table and then outputs the result in a CSV. A no bells and whistles version of this is about 100-150 lines of code in Python or Go.

Advantages:

  1. Easy to develop
  2. Can be run easily multiple times from local machines or CI/CD pipelines or whatever.

Disadvantages:

  1. It will cost you a bit of money. Scanning the whole table will use up some read units. Depending on the amount you are readin, this might get costly fast.
  2. Depending on the amount of data this can take a while.

Note: If you want to run this in a Lambda then remember that Lambdas can run for a maximum of 15 minutes. So once you more data than can be processed within those 15 minutes, you probably need to switch to Step Functions.

Option #2: Process a S3 backup

DynamoDB allows you to create backups of your table to S3 (as the article describes you linked). Those backups will either be in JSON or a JSON like AWS format. You then can write a program that converts those JSON files to CSV.

Advantages:

  1. (A lot) cheaper than a scan

Disadvantages:

  1. Requires more "plumbing" because you need to first create the backup, then do download it from S3 to wherever you want to process it etc.
  2. Probably will take longer than option #1