1
votes

I have a DynamoDB table which looks like this:

[id] [datetime] [name] [reasonForVisit] [__typename]

[id] is the simple primary key for the table

[__typename] is an attribute that has the same value for all items in the table

The table has become very big. I want to be able to filter the data by range in terms of datetime. I have 2 quickly implementable options on my mind, but am unsure if it would make much of a difference in terms of costs.

  1. SCAN the whole table and then filter by datetime (as dynamodb doesn't allow filtering before scan)
  2. Create a GSI (partitionKey: __typename, sortKey: datetime) and QUERY by the fixed __typename partition and filter by datetime sort key e.g. between 10/8/10 and 10/11/10.

So, my question is, since my partition key will be the same for each item, hence one big partition, I'm not sure if when QUERYing with a filter, does it still result in reading the whole table (similar to SCAN) or does it know to efficiently start reading from an item based on the filter?

1

1 Answers

0
votes

Filter happens always after all read operations. There is no difference between Scan and Query in that case. Also, if you create GSI with same PK for all elements you can hit "hot partition" problem and additionally you will slow down your writes.

In case you looking for 80% of records in table a scan+filter solution can be good fit. Otherwise you need to make use of querying capabilities Dynamo offers (so querying by PK and SK). What you could do is to introduce second record like:

id(PK), datetime, name, reasonForVisit, __typename
date(PK), time(SK), id

or:

id(PK), datetime, name, reasonForVisit, __typename
date(PK), time(SK), id_list

or if you data is immutable (historical) and won't ever change then:

id(PK), datetime, name, reasonForVisit, __typename
date(PK), time(SK), id, name, reasonForVisit, __typename

Querying for given range would be done using BatchGetItem. In first and second option you would need to make two writes when adding new record using BatchWriteItem (or TransactWriteItem if strict consistency matters). For third option you can instead use GSI on id.