0
votes

I need some help with designing a DynamoDB Hash+Range key scheme for fast single item write access as well as fast parallel read access to groups of items.

Background:

Currently, each fanning link is stored as an item in the following format:

{
     user_id : NUMBER
     fanned_id : NUMBER
     timestamp: NUMBER
},

where user_id is the hash key and fanned_id is the range key. This scheme allows for fast access to a single fanship item (via user_id + fanned_id), but when the complete fanship is read from DynamoDB, it takes long for the data to be transferred if the user has fanned thousands of other users.

Here is how I query DynamoDB using the boto python library:

table = Table("fanship_data", connection=conn)
fanship  = []
uid = 10
for fanned in table.query_2(user_id__eq=uid):
    fanship.append((fanned["fanned_id"],fanned["timestamp"]))

Clearly the throughput bottleneck is in the boto query, because the whole fanship of a user must be transferred at 25 items per second, even I have specified high throughput capacity for DynamoDB.

My question to you:

Assume that there is large read throughput capacity, and that all the data is present in DynamoDB. I do not mind resorting to multiprocessing, since that will be necessary for transferring the data in parallel. What scheme for the Hash+Range key will allow me to transfer the complete fanship of a user quickly?

1

1 Answers

0
votes

I think that your hash/range key schema is the right one for what you're trying to accomplish. I have implemented similar schemas on several of my tables.

According to the docs, "Query performance depends on the amount of data retrieved", and there doesn't seem to be a way to parallelize the read. The only way to do a parallel read is via a Scan, but I'm not sure if that is going to be a better approach for you.