I need some help with designing a DynamoDB Hash+Range key scheme for fast single item write access as well as fast parallel read access to groups of items.
Background:
Currently, each fanning link is stored as an item in the following format:
{
user_id : NUMBER
fanned_id : NUMBER
timestamp: NUMBER
},
where user_id is the hash key and fanned_id is the range key. This scheme allows for fast access to a single fanship item (via user_id + fanned_id), but when the complete fanship is read from DynamoDB, it takes long for the data to be transferred if the user has fanned thousands of other users.
Here is how I query DynamoDB using the boto python library:
table = Table("fanship_data", connection=conn)
fanship = []
uid = 10
for fanned in table.query_2(user_id__eq=uid):
fanship.append((fanned["fanned_id"],fanned["timestamp"]))
Clearly the throughput bottleneck is in the boto query, because the whole fanship of a user must be transferred at 25 items per second, even I have specified high throughput capacity for DynamoDB.
My question to you:
Assume that there is large read throughput capacity, and that all the data is present in DynamoDB. I do not mind resorting to multiprocessing, since that will be necessary for transferring the data in parallel. What scheme for the Hash+Range key will allow me to transfer the complete fanship of a user quickly?