1
votes

I'm still trying to wrap my head around primary key selection in DynamoDB. My current structure is the following, where userId is HASH and sort is RANGE.

userId sort event
1 2021-01-18#u2d3-f3d5-s22d-3f52 ...
1 2021-01-08#f1d3-s30x-s22d-w2d3 ...
2 2021-02-21#s2d2-u2d3-230s-3f52 ...
2 2021-02-13#w2d3-e5d5-w2d3-3f52 ...
1 2021-01-19#f2d4-f3d5-s22d-3f52 ...
1 2020-12-13#f3d5-e5d5-s22d-w2d3 ...
2 2020-11-11#e5d5-u2d3-s22d-0j32 ...

What I want to achieve is to query all events for a particular user between date A and date B. I have tested a few of solutions that all work, like

  • Figure out a closest common begins_with for the range I want. If date A is 2019-02-01 and date B is 2021-01-03, then it would be userId = 1 and begins_with (sort, 20), which would return everything from the twenty-first century.
  • Loop through all months between date A and date B and do a bunch of small queries like userId = 1 and begins_with (sort, 2021-01), then concat the results afterwards.

They all work but have their drawbacks. I'm also a bit unsure of when I'm just complicating things to the point where a scan might actually be worth it instead. Being able to use between would of course be the best option, but I need to put the unique #guid at the end of the range key in order to make each primary key unique.

Am I approaching this the wrong way?

1
Does between date A and date B (+1 day) not work? It should select those based on byte-order so I'd try that first.Maurice
I can't believe it was that easy, and so obvious when you hear it. For some reason I assumed the whole key had to be a valid date. Thank you @Maurice!Magnus Engdal
:-) - I'm in the process of building a Demo, I'll add an answer in a couple of minutesMaurice

1 Answers

1
votes

I created a little demo app to show how this works.

You can just use the between condition, because it uses byte-order to implement the between condition. The idea is that you use the regular starting date A and convert it to a string as the beginning of the range. Then you add a day to your end, convert it to string and use that as the end.

The script creates this table (it will look different when you run it):

PK   | SK
------------------------------------------------------
demo | 2021-02-26#a4d0f5f3-588a-49d9-8eaa-a3e2f9436ade
demo | 2021-02-27#92b9a41b-9fa5-4ee7-8663-7b801192d8dd
demo | 2021-02-28#e5d162ac-3bbf-417a-9ec7-4024410e1b01
demo | 2021-03-01#7752629e-dc8f-47e0-8cb6-5ed219c434b5
demo | 2021-03-02#dd89ca33-965c-4fe1-8bcc-3d5eee5d6874
demo | 2021-03-03#b696a7fc-ba17-47d5-9d19-454c19e9bccc
demo | 2021-03-04#ee30b1ce-3910-4a59-9e62-09f051b0dc72
demo | 2021-03-05#f0e2405f-6ce9-4fcb-a798-394f7a2f9490
demo | 2021-03-06#bcf76e07-7582-4fe3-8ffd-14f450e60120
demo | 2021-03-07#58d01231-a58d-4c23-b1ed-e525ba102b80

And when I run this function to select the items between two given dates, it returns the result below:

def select_in_date_range(pk: str, start: datetime, end: datetime):

    table = boto3.resource("dynamodb").Table(TABLE_NAME)

    start = start.isoformat()[:10]
    end = (end + timedelta(days=1)).isoformat()[:10]

    print(f"Requesting all items starting at {start} and ending before {end}")

    result = table.query(
        KeyConditionExpression=\
            conditions.Key("PK").eq(pk) & conditions.Key("SK").between(start, end)
    )

    print("Got these items")
    for item in result["Items"]:
        print(f"PK={item['PK']}, SK={item['SK']}")
Requesting all items starting at 2021-02-27 and ending before 2021-03-04
Got these items
PK=demo, SK=2021-02-27#92b9a41b-9fa5-4ee7-8663-7b801192d8dd
PK=demo, SK=2021-02-28#e5d162ac-3bbf-417a-9ec7-4024410e1b01
PK=demo, SK=2021-03-01#7752629e-dc8f-47e0-8cb6-5ed219c434b5
PK=demo, SK=2021-03-02#dd89ca33-965c-4fe1-8bcc-3d5eee5d6874
PK=demo, SK=2021-03-03#b696a7fc-ba17-47d5-9d19-454c19e9bccc

Full script to try it yourself.

import uuid
from datetime import datetime, timedelta

import boto3
import boto3.dynamodb.conditions as conditions

TABLE_NAME = "sorting-test"

def create_table():
    ddb = boto3.client("dynamodb")
    ddb.create_table(
        AttributeDefinitions=[{"AttributeName": "PK", "AttributeType": "S"}, {"AttributeName": "SK", "AttributeType": "S"}],
        TableName=TABLE_NAME,
        KeySchema=[{"AttributeName": "PK", "KeyType": "HASH"}, {"AttributeName": "SK", "KeyType": "RANGE"}],
        BillingMode="PAY_PER_REQUEST"
    )

def create_sample_data():
    pk = "demo"
    amount_of_events = 10

    table = boto3.resource("dynamodb").Table(TABLE_NAME)

    start_date = datetime.now()
    increment = timedelta(days=1)

    print("PK   | SK")
    print("------------------------------------------------------")
    for i in range(amount_of_events):
        date = start_date.isoformat()[:10]
        unique_id = str(uuid.uuid4())
        sk = f"{date}#{unique_id}"
        print(f"{pk} | {sk}")

        start_date += increment

        table.put_item(Item={"PK": pk, "SK": sk})

def select_in_date_range(pk: str, start: datetime, end: datetime):

    table = boto3.resource("dynamodb").Table(TABLE_NAME)

    start = start.isoformat()[:10]
    end = (end + timedelta(days=1)).isoformat()[:10]

    print(f"Requesting all items starting at {start} and ending before {end}")

    result = table.query(
        KeyConditionExpression=\
            conditions.Key("PK").eq(pk) & conditions.Key("SK").between(start, end)
    )

    print("Got these items")
    for item in result["Items"]:
        print(f"PK={item['PK']}, SK={item['SK']}")

def main():
    pass
    # create_table()
    # create_sample_data()
    start = datetime.now() + timedelta(days=1)
    end = datetime.now() + timedelta(days=5)
    select_in_date_range("demo",start, end)

if __name__ == "__main__":
    main()