The documentation for Dynamodb is reasonably clear on how to evenly spread data across partitions by managing your hash/range key naming scheme.
Due to this I tend to more often than not use unique alphanumeric hash keys. However in this instance we have a situation where the actual size of the key itself is of great importance since the hash key chosen in dynamodb will be replicated over and over again in various streams in redis
.
Therefore we need a key which suits both dynamodb
from a data access/partitioning point of view as well as redis
from a pure key size point of view.
With this in mind we have decided to keep an incrementing counter in redis
and use a single NUMBER
hash key for dynamodb items. Incrementing the redis
counter each time we insert a new item in to dynamodb.
These integer keys are very nicely compressed in redis
and from our testing yield storage space improvements in excess of 300-400% over unique string based ID's (since these ID's could potentially be pushed into 100's of streams, all stored in redis
lists/zsets.
To my understanding though, this is not good for dynamodb since a single incrementing hash key:
101
102
103
104
etc...
Would be slow on writes when inserting multiple items and given our access pattern, we would expect groups of these keys to be retrieved together.
In order to work around this we are thinking of concatenating a random number onto the end of the hash key.
(float)$itemId . '.' . mt_rand(0, 200)
Resulting in keys like so:
101.26
102.199
103.87
104.5
Using these keys we would still get the storage improvements in redis
and we also manage to preserve the insertion order (meaning that we don't need to store a timestamp)...
However I am not completely clear on how dynamodb would manage and partition these.
So my question is, would single hash keys as shown above be optimal and encourage dynamodb to partition our table effectively and ultimately allow us to meet or throughput allocations.
Thanks in advance.