0
votes

I've written an async data migration tool to migrate data to AWS DynamoDB. We've provisioned huge resources within our destination table in Dynamo.

Below are our write capacity graph and throttled request graphs. Why might such a large amount of requests be getting throttled if we're not even coming close to our write capacity? All the data is eventually flowing, but very slowly, because I do constant retry.

Write capacity graph Throttled write requests graph

1
do you have a Global Secondary Index on that table that you're importing to? - matias elgart
I've got about 5 or so indexes on the table, each with write capacity equal to the table itself (5000 each). - Abarnett
gotcha. i was looking at this documentation: For example, if you Query a global secondary index and exceed its provisioned read capacity, your request will be throttled. If you perform heavy write activity on the table, but a global secondary index on that table has insufficient write capacity, then the write activity on the table will be throttled. from here: docs.aws.amazon.com/amazondynamodb/latest/developerguide/… - matias elgart
Yeah, I thought about that one. Unfortunately, the graphs for each index look very similar - many throttled writes, and a similar write-capacity graph (way under provisioned write capacity). Good thoughts though. - Abarnett
How's your read capacity? - Michael - sqlbot

1 Answers

2
votes

The provisioned write throughput is distributed among all shards of your table. Depending on the size and throughput of your table, your data is distributed among shards.

( readCapacityUnits / 3,000 ) + ( writeCapacityUnits / 1,000 ) = initialPartitions (rounded up)

A partition is created for every 10 GB of data as well.

See Understand Partition Behavior.

If your write requests are not distributed among multiple partition keys you will experience throttled requests before you are hitting the provisioned throughput.

In your specific case, your table exists of at least 5 paritions. Which means you are able to use a maximum 1,000 units of write capacity per second per shard.

Second thing to consider is the size of your items. Each write requests consums 1 write capacity unit per 1 KB item size.

See Write Capacity Units.

In summary: you are only able to make use of 100% of your provisioned throughput if your write or read requests hit all shards in parallel. To do so, you need to distribute your workload among multiple different partition keys.