How do I query DynamoDB when I want to consider the sort key but not the partition key?

Question

I can't figure out how to do this in DynamoDB.

I have a table with data something like this:

ID    Updated     other fields...

1200  2017-12-11              ...
1201  2018-02-05              ...
1205  2018-01-05              ...
1206  2018-01-11              ...
1210  2018-02-15              ...
1212  2018-02-10              ...

The partition key is 'ID' and I have a sort key of 'Updated'.

I want to retrieve the records where Updated is greater than "2018-02-01", say.

I can't query on just 'Updated' alone, it complains with Query condition missed key schema element: ID. I understand what that means, but I'm not sure how to do this properly.

I've tried adding various indexes and then querying on the index, including having only the 'Updated' field as the partition key, but then I can't query for a range of values only an exact match on the partition key.

So, how do I query across multiple partitions for a condition?

I could use a scan, but that is potentially expensive. Can I do this by indexing it a certain way? Or is there a way to do something similar to a query where I don't need to specify the partition key?

I'm beginning to think that using a unique ID for a partition key is a bad idea, it would almost be better to partition the items that belong together, but then if two records have the same 'Updated' fields the two keys taken together would not be unique and one record would overwrite the other? I think I'm really confused about how to use the partition key properly. — Octopus
Using an unique ID for a partition key is generally a good idea. It helps scalability. Partition keys are not sorted, so you'll need a partition key other then Updated. — Costin
In order to assure unicity of keys, consider to append a random value to Updated. Thus, your data will be in the same position, but unique :) — Costin

F_SO_K F_SO_K · Accepted Answer · 2018-03-29T09:22:57

Use a scan

Almost everyone using DynamoDB seems to get worried about scans. Scans are FINE in many circumstances. Things you should ask yourself include; how much data will I have, how will it grow over time, how fast do I need the scan to complete, how many RCUs will this cost? Don't just dismiss scans - do the maths.

Archive data

If you only need to access recent data, consider deleting or archiving old data. By removing it from your table you can increase the performance of scans.

Partition by date

There are various strategies you can use to improve your table performance if you really want to use a query. For example you could have a partition key of YYYY-MM and sort key of datetime (down to nanosecond). That way you can retrieve whole months of data in one query, whilst still being able to sort for specific date ranges. This kind of query is much more complicated to handle in your application than a scan. Architecting your tables really depends on your data access patterns.

How do I query DynamoDB when I want to consider the sort key but not the partition key?

3 Answers

Use a scan

Archive data

Partition by date