4
votes

Consider a DynamoDB table consisting of a primary key and two attributes describing a start and an end date. How to query if a time range is overlapping the time ranges in the table without scanning the entire table?

Example: The dynamo table have two records

PK  Start        End
A   2019-01-01   2019-10-01
B   2019-06-01   2019-08-01

Query which records overlap the time range 2018-02-01 to 2019-03-01.

2
Everything I've read so far points at this not being possible. And index for this needs to use binary space partitioning (interval partitioning) which is not possible with a b-tree. 😕 – Chet

2 Answers

3
votes

Disclaimer: This answer is flawed, and does not account for ranges which start inside but end outside the query range, or ranges which are bigger than the query range.

As you are no doubt aware, DynamoDB is unable to utilize more than one index in a query.
In most databases, you could place an index on the "start" and "end" columns, the database engine would be able to fairly quickly determine the intersection of matching records.

In lieu of this functionality, we need a way to encode the range information into a single indexable field.

The way to do this is to utilize "Z-order indexing".
Z-order indexing is a way of encoding multi-dimensional information.

Z-order indexing, and how it can be applied to DynamoDB, is described in detail on this amazon blog post, part one, part two.

Essentially the way it works is by interleaving the data from the fields you want to query, you can do this at a binary level, or potentially at a string level as well.
A basic way it could be applied for a date range string would be interleaving your range "20190101" to "20191001" into a single field "2200119901100011"

start     end       interleaved
20190101  20191001  2200119901100011
20190601  20190801  2200119900680011

Then to query dates between "20190502" and "20190905", use the common prefix between the two dates, add one to the ending range(the logic for this is simpler with a binary representation).

interleaved BETWEEN "22001199005" AND "2200119901" AND start >= "20190502" AND end < "20190905"

Note, that using the interleaved index alone, is still approximate, and you still need to define explicit conditions for the start and end ranges.
However, this approach avoids scanning the whole table.
Of course, if you query a huge date range, it might end up querying the whole table anyway, the smaller the range, the more efficient the index will be.

-1
votes

honestly I'm not sure if DynamoDB is the right solution for such use case