Data Pipeline - Dynamo DB export

Question

I have a table in DynamoDB which has millions of records. I have created a secondary index (GSI) based on the criteria and filtering products based on that. Now, I wanted to use AWS datapipe line to query products from the table and export it to S3.

Questions:

a) Can we Specify GSI name in the pipeline - Because querying on a large table using data pipeline is getting cancelled because of timeout issue. [The pipeline configuration has 6 hrs max wait time, it is reaching that and getting cancelled]? b) Is there any better way to create an export dumps from the table quickly using the GSI index?

Please share your views.

Regards, Kishore

user818510 user818510 · Accepted Answer · 2017-05-30T17:32:17

You cannot specify the GSI in the pipeline. The list of available options you can specify for a dynamodb node are given here. The data pipeline service actually creates an EMR cluster for the export job which uses parallel table scans. You could try using a larger instance size for your nodes to speed it up the process.

Since your table has millions of records, make sure you have provisioned enough read throughput. Even if your provisioned throughput is high, the speed of export depends on what percent of the provisioned throughput is allocated for the export job. This is described in the AWS pipeline documentation here.

Data Pipeline - Dynamo DB export

Questions:

1 Answers