12
votes

Let's say I have a user table with id and timestamp attributes. I would like to be able to query on both parameters. If I understand the documentation correctly, there are two ways of doing this with DynamoDB:

  1. Define a hash+range primary key using id as the hash and timestamp as the range.
  2. Define a hash-only primary key using id and define a global secondary index using timestamp.

What are the benefits and drawbacks of each approach?

3

3 Answers

14
votes

Define a hash+range primary key using id as the hash and timestamp as the range.

By making id the Hash Key and timestamp the Range Key, you are effectively creating a 'composite primary key'.

In order words, your DynamoDB schema would allow the following data (notice that 'john' is repeated three times)

id (Hash) | timestamp (Range)
----------|-------------------------
john      | 2014-04-28T07:53:29.000Z
john      | 2014-04-28T08:53:29.000Z
john      | 2014-04-28T09:53:29.000Z
mary      | 2014-04-28T07:53:29.000Z
jane      | 2014-04-28T07:53:29.000Z

And you can perform these operations:

  1. GetItem to get a single item based on the id (Hash Key) + timestamp (Range Key) combination
  2. Query to get a list of all items equal to the id (Hash Key)

If this is not what you intended for, then hash + range on id and timestamp respectively is not what you are looking for.

Define a hash-only primary key using id and define a global secondary index using timestamp.

Using a hash-only primary key on id, id must be unique.

id (Hash) | timestamp (GSI Hash Key)
----------|-------------------------
john      | 2014-04-28T07:53:29.000Z
mary      | 2014-04-28T07:53:29.000Z
jane      | 2014-04-28T07:53:29.000Z

Then by applying GSI hash-only on timestamp, you would be able to query for a list of ids for a particular timestamp.

The benefits to this approach is that, it is definitely the correct solution for your use case. #1 is misuse of range key (unless you are intending to ensure at the application level id is not duplicated which is probably a bad idea).

The drawback to using GSI are:

  1. There can only be a maximum of 5 GSI per table, so choose wisely what you want indexed DynamoDB Update Dec 2019 - You can now create as many as 20 GSI per table, and can further raise this soft limit through a request https://aws.amazon.com/about-aws/whats-new/2018/12/amazon-dynamodb-increases-the-number-of-global-secondary-indexes-and-projected-index-attributes-you-can-create-per-table/
  2. GSI will cost you additional money as you will need to assign Provisioned Throughput to it.
  3. GSI is eventually consistent, meaning that DynamoDB does not guarantee that the moment data associated to the table's hash key is written into DB, the data's GSI hash key immediately becomes available for querying. DynamoDB doc states that this is usually immediate, but can be the case that it could take up to seconds for the GSI hash key to become available.
  4. You cannot perform GetItem on a GSI to obtain an item based on its Hash Key / Hash Key + Range Key. You are limited to use Query which returns a List
5
votes

This answer may be of some use, but you're right about the two ways that you could accomplish it.

Assuming that you use id as your hash key, then in order to retrieve an item using only a timestamp, you'll need a global secondary index. You can still make timestamp your the range key, which will be useful in that DynamoDB will use it to sort the results your queries by id.

The main drawback of using a global secondary index is that you'll need additional provisioned throughput on the table.

0
votes

I have a similar interest and was considering creating a secondary index on a portion of the timestamp (e.g., day or hour) as the HASH and the Id as the RANGE to allow for a query against a particular time slice, but this would force all records within a time slice to be in the same partition for the index.

For being able to query recent data vs. historical data, Amazon recommends a multi-table design approach - see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-time-series.html.