260
votes

I am not able to understand what Range / primary key is here in the docs on Working with Tables and Data in DynamoDB

How does it work?

What do they mean by "unordered hash index on the hash attribute and a sorted range index on the range attribute"?

4

4 Answers

648
votes

"Hash and Range Primary Key" means that a single row in DynamoDB has a unique primary key made up of both the hash and the range key. For example with a hash key of X and range key of Y, your primary key is effectively XY. You can also have multiple range keys for the same hash key but the combination must be unique, like XZ and XA. Let's use their examples for each type of table:

Hash Primary Key – The primary key is made of one attribute, a hash attribute. For example, a ProductCatalog table can have ProductID as its primary key. DynamoDB builds an unordered hash index on this primary key attribute.

This means that every row is keyed off of this value. Every row in DynamoDB will have a required, unique value for this attribute. Unordered hash index means what is says - the data is not ordered and you are not given any guarantees into how the data is stored. You won't be able to make queries on an unordered index such as Get me all rows that have a ProductID greater than X. You write and fetch items based on the hash key. For example, Get me the row from that table that has ProductID X. You are making a query against an unordered index so your gets against it are basically key-value lookups, are very fast, and use very little throughput.


Hash and Range Primary Key – The primary key is made of two attributes. The first attribute is the hash attribute and the second attribute is the range attribute. For example, the forum Thread table can have ForumName and Subject as its primary key, where ForumName is the hash attribute and Subject is the range attribute. DynamoDB builds an unordered hash index on the hash attribute and a sorted range index on the range attribute.

This means that every row's primary key is the combination of the hash and range key. You can make direct gets on single rows if you have both the hash and range key, or you can make a query against the sorted range index. For example, get Get me all rows from the table with Hash key X that have range keys greater than Y, or other queries to that affect. They have better performance and less capacity usage compared to Scans and Queries against fields that are not indexed. From their documentation:

Query results are always sorted by the range key. If the data type of the range key is Number, the results are returned in numeric order; otherwise, the results are returned in order of ASCII character code values. By default, the sort order is ascending. To reverse the order, set the ScanIndexForward parameter to false

I probably missed some things as I typed this out and I only scratched the surface. There are a lot more aspects to take into consideration when working with DynamoDB tables (throughput, consistency, capacity, other indices, key distribution, etc.). You should take a look at the sample tables and data page for examples.

25
votes

As the whole thing is mixing up let's look at it function and code to simulate what it means consicely

The only way to get a row is via primary key

getRow(pk: PrimaryKey): Row

Primary key data structure can be this:

// If you decide your primary key is just the partition key.
class PrimaryKey(partitionKey: String)

// and in thids case
getRow(somePartitionKey): Row

However you can decide your primary key is partition key + sort key in this case:

// if you decide your primary key is partition key + sort key
class PrimaryKey(partitionKey: String, sortKey: String)

getRow(partitionKey, sortKey): Row
getMultipleRows(partitionKey): Row[]

So the bottom line:

  1. Decided that your primary key is partition key only? get single row by partition key.

  2. Decided that your primary key is partition key + sort key? 2.1 Get single row by (partition key, sort key) or get range of rows by (partition key)

In either way you get a single row by primary key the only question is if you defined that primary key to be partition key only or partition key + sort key

Building blocks are:

  1. Table
  2. Item
  3. KV Attribute.

Think of Item as a row and of KV Attribute as cells in that row.

  1. You can get an item (a row) by primary key.
  2. You can get multiple items (multiple rows) by specifying (HashKey, RangeKeyQuery)

You can do (2) only if you decided that your PK is composed of (HashKey, SortKey).

More visually as its complex, the way I see it:

+----------------------------------------------------------------------------------+
|Table                                                                             |
|+------------------------------------------------------------------------------+  |
||Item                                                                          |  |
||+-----------+ +-----------+ +-----------+ +-----------+                       |  |
|||primaryKey | |kv attr    | |kv attr ...| |kv attr ...|                       |  |
||+-----------+ +-----------+ +-----------+ +-----------+                       |  |
|+------------------------------------------------------------------------------+  |
|+------------------------------------------------------------------------------+  |
||Item                                                                          |  |
||+-----------+ +-----------+ +-----------+ +-----------+ +-----------+         |  |
|||primaryKey | |kv attr    | |kv attr ...| |kv attr ...| |kv attr ...|         |  |
||+-----------+ +-----------+ +-----------+ +-----------+ +-----------+         |  |
|+------------------------------------------------------------------------------+  |
|                                                                                  |
+----------------------------------------------------------------------------------+

+----------------------------------------------------------------------------------+
|1. Always get item by PrimaryKey                                                  |
|2. PK is (Hash,RangeKey), great get MULTIPLE Items by Hash, filter/sort by range     |
|3. PK is HashKey: just get a SINGLE ITEM by hashKey                               |
|                                                      +--------------------------+|
|                                 +---------------+    |getByPK => getBy(1        ||
|                 +-----------+ +>|(HashKey,Range)|--->|hashKey, > < or startWith ||
|              +->|Composite  |-+ +---------------+    |of rangeKeys)             ||
|              |  +-----------+                        +--------------------------+|
|+-----------+ |                                                                   |
||PrimaryKey |-+                                                                   |
|+-----------+ |                                       +--------------------------+|
|              |  +-----------+   +---------------+    |getByPK => get by specific||
|              +->|HashType   |-->|get one item   |--->|hashKey                   ||
|                 +-----------+   +---------------+    |                          ||
|                                                      +--------------------------+|
+----------------------------------------------------------------------------------+

So what is happening above. Notice the following observations. As we said our data belongs to (Table, Item, KVAttribute). Then Every Item has a primary key. Now the way you compose that primary key is meaningful into how you can access the data.

If you decide that your PrimaryKey is simply a hash key then great you can get a single item out of it. If you decide however that your primary key is hashKey + SortKey then you could also do a range query on your primary key because you will get your items by (HashKey + SomeRangeFunction(on range key)). So you can get multiple items with your primary key query.

Note: I did not refer to secondary indexes.

21
votes

A well-explained answer is already given by @mkobit, but I will add a big picture of the range key and hash key.

In a simple words range + hash key = composite primary key CoreComponents of Dynamodb enter image description here

A primary key is consists of a hash key and an optional range key. Hash key is used to select the DynamoDB partition. Partitions are parts of the table data. Range keys are used to sort the items in the partition, if they exist.

So both have a different purpose and together help to do complex query. In the above example hashkey1 can have multiple n-range. Another example of range and hashkey is game, userA(hashkey) can play Ngame(range)

enter image description here

The Music table described in Tables, Items, and Attributes is an example of a table with a composite primary key (Artist and SongTitle). You can access any item in the Music table directly, if you provide the Artist and SongTitle values for that item.

A composite primary key gives you additional flexibility when querying data. For example, if you provide only the value for Artist, DynamoDB retrieves all of the songs by that artist. To retrieve only a subset of songs by a particular artist, you can provide a value for Artist along with a range of values for SongTitle.

enter image description here

https://www.slideshare.net/InfoQ/amazon-dynamodb-design-patterns-best-practices https://www.slideshare.net/AmazonWebServices/awsome-day-2016-module-4-databases-amazon-dynamodb-and-amazon-rds https://ceyhunozgun.blogspot.com/2017/04/implementing-object-persistence-with-dynamodb.html

0
votes

@vnr you can retrieve all the sort keys associated with a partition key by just using the query using partion key. No need of scan. The point here is partition key is compulsory in a query . Sort key are used only to get range of data