1
votes

I am relatively new to DynamoDB, and we are designing a free-form search GUI for one of our application. The primary data store we are using is traditional relational database, we are planning to use DynamoDB as a persistent "cache" layer on top of database for search only.

In our case, we have 3 keys to determine a customer .

we store the customer as combination of above 3 ids as follow:

  1. billingAccountNumber + customerId
  2. billingAccountNumber + InstanceId
  3. customerId
  4. InstanceId

Each Item in DynamoDB represents a event happens to customer at certain time.

what's the best way to design this pattern in DynamoDB. The query will something like

  1. events for certain billingAccountNumber for period of time.
  2. events for certain customerId for period of time
  3. events for certain instanceId for period of time.

etc.

Currently, I am using BillingAccountNumber as the partition key, since this will evenly distribute the load, and timestamp as the sort key, so that we can get result for a given range.

I am debating on if I can use customerId or instanceId as the sort key, and timestamp as a filter, so that i can do a query with filterExpression on the timestamp.

which way is more efficient in terms of the performance and cost?

1

1 Answers

0
votes

I am debating on if I can use customerId or instanceId as the sort key, and timestamp as a filter, so that i can do a query with filterExpression on the timestamp.

Sort key is about sorting. Do your customerId have any sorting meaning? I guess not, most of the time they are suitable for Partition Key. The same for instanceId.

Instead, timestamp is very suitable for a Sort Key. I strongly recommend you to use it so. It is very efficient.

Using the timestamp in the filterExpression is not a good idea, because your query will do a scan and then apply the filter. On a huge table this is exactly what not to do.

See below for suggestions.


Your table's key must provide unicity for each item. If the billingAccountNumber fully identifies the row, great. If it doesn't put something in Sorting key to ensure unicity.

In order to answer the queries you need Global Secondary Indexes (GSI):

  1. events for certain billingAccountNumber for period of time • PK: billingAccountNumber, SK: timestamp
  2. events for certain customerId for period of time • PK: customerId, SK: timestamp
  3. events for certain instanceId for period of time • PK: instanceId, SK: timestamp

Use a query like: "#customerId = :customerId AND #timestamp IS BETWEEN :ts0 AND :ts1" Play with the queries.