DynamoDB fill empty table with tonns of data capped at 1000WCU

Question

I'm writing a script, that should fill the new table with data in the shortest terms (~650Gb table). The partition(hash) key is different between all records, so I can't imagine the better key. I've set the provisioned WCU for this table at 4k.

When script works, 16 independent threads put different data into the table at a high rate. During execution, I receive ProvisionedThroghputException. The Cloudwatch graphs show that consumed WCU is capped at 1000WCU.

It could happen if all data is put to one partition. As I understand, the DynamoDb would create the new partition, when data size would exceed the 10Gb limit. Is it so? So, during this data fill operation, I have only 1 partition and the limit of 1000WCU is understandable.

I've checked the https://aws.amazon.com/ru/premiumsupport/knowledge-center/dynamodb-table-throttled/
But seems that these suggestions are applied to already filled tables and you try to add a lot of new data there.

So I have 3 questions:
1. How I can speed up the process of inserting data into the new empty table?
2. When DynamoDB decide to create a new partition?
3. Can I set up a minimum number of partitions (for ex. 4), to use all the power of provisioned WCU (4k)?

UPD Cloudwatch graph:

UPD2 the HASH key is long number. Actually it's not strongly unique. But max rows with same HASH key but different RANGE keys is 2.

Thank you. This is related info, but it seems the behavior is different from that article. I create new table, and set the provisioned WCU to 4k. As it is described in the link above, DynamoDB should create 4 partitions (as I exceeding limit 1k per partition). In other words, seems that it works somehow different. One of the suggestions there is to increase Provisioned Throughput. But it is already much higher, than consumed. — Alexander Gubarets
This seems to be in the weeds of DynamoDB, and I'm not aware of any publicly available data that explains exactly how this works. My only guess would be that, since your experiment was relatively short (~20mins) that DynamoDB did not have time to respond. You could try incrementally increasing the WCU over a period of time, which might do better at tripping any thresholds on the DynamoDB side. Thats guess work though. — F_SO_K
For me, using multithreaded write and a batched put_item calls solved a similar phenomena — Ron U

Charles Charles · Accepted Answer · 2020-04-17T14:32:31

You can't manually specify the number of partitions used by DDB. It's automatically handled behind the scenes.

However, the way it's handled is laid out in the link provided by F_SO_K.

1 for every 10GB of data
1 for every 3000RCU and/or 1000WCU provisioned.

If you've provisioned 4000WCU, then you should have at least 4 partitions and you should be seeing 4000WCU consumed. Especially given that you said your hash key is unique for every record, you should have data uniformly spread out and not be running into a "hot" partition.

You mentioned cloudwatch showing consumed WCU at 1000, does cloudwatch also show provisioned capacity at 4000WCU?

If so, not sure what's going on, may have to call AWS.

DynamoDB fill empty table with tonns of data capped at 1000WCU

1 Answers