How to export an AWS DynamoDB table to an S3 Bucket?

Question

I have a DynamoDB table that has 1.5 million records / 2GB. How to export this to an S3?

The AWS data pipeline method to do this worked with a small table. But i am facing issues with exporting the 1.5 million record table to my S3.

At my initial trial, the pipeline job took 1 hour and failed with

java.lang.OutOfMemoryError: GC overhead limit exceeded

I had increased the namenode heap size by supplying a hadoop-env configuration object to the instances inside the EMR cluster by following this link

After increasing the heapsize my next job run attempt failed after 1 hour with another error as seen in the screenshot attached. I am not sure what to do here to fix this completely.

Also while checking the AWS Cloudwatch graphs of the instances in the EMR cluster. The core node was continuously at a 100% CPU usage.

The EMR cluster instance types (master and core node) were m3.2xlarge.

This might be a long shot, but does it work on newer instance types such as m5. The m3s are legacy — Chris Williams
you can define a hive table using dynamodb emr connector and run a spark job which import data from dynamodb and export it into s3 — Abdelrahman Maharek

Afnas Afnas · Accepted Answer · 2020-09-07T08:39:44

The issue was with the maptasks not running efficiently. The core node was hitting 100% CPU usage. I upgraded the cluster instance types to one of the compute C series available and the export worked with no issues.

How to export an AWS DynamoDB table to an S3 Bucket?

1 Answers