Amazon DynamoDB table design and querying

Question

We are considering DynamoDB for an expectedly large dataset. I come from a strong SQL background so the No-SQL way of thinking is new to me.

I have a problem and design, but ran into what appears to be a dead end.
The documentation says to make sure your Hash keys are widely distributed to aid in performance, okay that makes sense.

I am going to be recording various datapoints/actions for users. It makes sense to me that the hash key should be the user-id, and my range key can be the action(s) performed.

Now, if I want all the actions user #1 performs, I can easily query that.
But, if I want all the USERS who performed action X, I cannot do that without a table scan. From the Query documentation:

A Query operation directly accesses items from a table using the table primary key, or from an index using the index key. You must provide a specific hash key value.

So it would seem I am limited to getting data from a specific user, unless I am willing to do a table scan, which is slower and consumes many capacity units.

My question is, I think, ultimately a design question. Maybe I am missing something when it comes to No-SQL? Should my hash key be something else? Or is it simply that my requirements do not fit in with No-SQL (and more specifically, DynamoDB)?

It is almost as if the hash key is a kind of grouping with DynamoDB. I considered changing the hash key to the actions we are intending to put into place, but then I am not widely distributing my keys...

You're lucky, only 6 days ago support for secondary indexes (indices?) was announced. See here. — Niv Steingarten

antlersoft antlersoft · Accepted Answer · 2013-04-24T22:38:00

The DynamoDb way to meet your requirement to allow both types of queries is to store the data in two tables, one with hash key user-id and range key action-id, and one with hash key action-id and range key user-id.

And you should think about if you need all the data in both tables, or if one can be a summary table. For example, say you have a limited number of possible actions. Instead of putting the full record of every action in the user-keyed table, you might want a table with only one row for each user: a hash key of user - id, and a second column that is multiply valued and is a list of any action-id that the user has performed at least once.

Amazon DynamoDB table design and querying

3 Answers