Amazon Titan: data is unevenly distributed on DynamoDB partitions

Question

We have 314m records to be added to Titan. Working with Amazon Titan with DynamoDB tables as backend, we realized that around 10% of our data is located on one partition out of 125.

This uneven distribution causes issues both on write and read operations. What could be the reason for this uneven distribution? We are using single-item model, would that be the reason for the issue?

Can you add your table structure and what are the values of hash keys? — Harshal Bulsara
The table structure of single item model is explained in this doc. Please check single item data model section. — Mohamed Taher Alrefaie

Alexander Patrikalakis Alexander Patrikalakis · Accepted Answer · 2017-03-29T08:39:30

Uneven distribution of data is caused by clustering around the same partition keys in DynamoDB. As partition keys correspond to out-vertex ids in Titan, if you have lots of properties on one vertex or if you have lots of edges coming out of a vertex (super nodes) you should try to load your graph with vertex partitioning enabled on that vertex label. When you create the vertex label in TitanManagement, all you need to do is call .partition() before committing the TitanManagement operation. If your DynamoDB table has 125 partitions you will want around 256 max-partitions to guarantee that the data is spread around your physical partitions evenly.

Amazon Titan: data is unevenly distributed on DynamoDB partitions

1 Answers