6
votes

I am trying to wrap my head around how I should design a table in DynamoDB. I need a Subscription table that stores items for each subscription a company has. So I was planning on the following attributes.

  • SubscriptionID
  • CompanyID
  • StartDate
  • EndDate
  • ProductID

I have read quite a lot about how indexing works in DynamoDB in regards to secondary indexes, but I really struggle to make sense of it all. It seems as if the more I read, the more confused I get. I understand the core concepts of global secondary indexes and local secondary indexes as well as hash key + range key. The problem is putting it all together and designing a table that supports my query requirements.

As of now, I have the following query requirements:

  1. Being able to query for active subscriptions for a given company. That is, querying by CompanyID and comparing StartDate and EndDate to a given date
  2. The same as the above, except querying to see if a given company has an active subscription for a specific ProductID

In the future (or now), I would like to be able to query for all active subscriptions for a given ProductID. My initial plan was to use SubscriptionID as the hash key and the CompanyID as a range key for efficient queries for a given company's subscriptions. However, I am not sure if it is even possible to query for the range key alone? I know that the combination of the hash key and range key is unique (composite key), but do I always need to include the hash key when performing lookups for the range key?

I was also thinking to make local secondary indexes for StartDate and EndDate to efficiently query for a given company's active subscriptions, but I am not sure if I should make these global secondary indexes instead for when I want to query for all of the active subscriptions (for all companies)? My guess would be yes, because I need to query across all partitions and not just the one for a single company.

Like I mentioned, I have a fundamental understanding of the concepts, but my struggle is when it comes to querying tables/indexes and which query patterns each combination of primary keys and global/local secondary indexes make possible/impossible. I find it hard to grasp the combination of a key-value store with the added capabilities that secondary indexes provide. I would really love if anyone could give me an example of how they would design the table and explain why, such as how the various choices support my particular query requirements.

I am aware that much of this stuff is documented, but I am struggling to use this information in relation to this scenario. I hope anyone can help. So: how would you design the table to support the queries that I listed and why?

1
Hi, I have a few clarifying questions: Is SubscriptionID globally unique? How many subscriptions will there be per company at most (rom)? Given your query needs, are you sure that DynamoDB is the right choice vs. a SQL RDBMS (is there a compelling reason for you to use Dynamo)? - jarmod
@jarmod Yes, SubscriptionID is globally unique. There won't be so many subscriptions per company; maximum a few hundreds I'd say, but that will be very rare. I could implement this in a relational database, sure, and now that you mention it, it might not be a bad choice. However, my confusion still applies for many similar use cases where the use of a NoSQL database is more appropriate. Therefore, I would greatly appreciate any advice and clearing out my confusion regardless. Thanks! - ba0708
Reason I ask about number of subscriptions per company is that, afaik, in order for you to query subscriptions for a given company that are active on a given date, you would have to use a GSI query (:companyid = CompanyID and :date >= StartDate) but you would also need to apply an additional server-side filter (:date <= EndDate). You can't do this 3-attribute query natively using indexes alone in DynamoDB, afaik, and using a server-side filter would be a challenge with millions of records, but 100s wouldn't be an issue. - jarmod
@jarmod Ah, that makes perfect sense! In that case, it definitely won't be an issue, but good observation. I was not aware of that limitation, so thanks for pointing that out! I didn't see it mentioned anywhere in the documentation, but you are probably right anyways. :-) - ba0708

1 Answers

8
votes

Here's a proposal, given some assumptions:

  • assume that subscription ID is globally unique
  • assume that you need to be able to retrieve a subscription given its subscription ID
  • assume that dates can be represented as numbers (e.g. Julian Day Number)

Table design:

  • hash key: SubscriptionID
  • other attributes CompanyID: S, StartDate: N, EndDate: N, ProductID: S

Global secondary indexes:

  • ProductID-StartDate-Index: hash on ProductID, range on StartDate
  • CompanyID-ProductID-index: hash on CompanyID, range on ProductID
  • CompanyID-StartDate-index: hash on CompanyID, range on StartDate

Queries:

  • query by company for active subscriptions on a given date: use CompanyID-StartDate-index, query CompanyID = :companyid, StartDate <= :date, add filter EndDate >= :date

  • query by company for active subscriptions for a given product on a given date: use CompanyID-ProductID-index, query CompanyID = :companyid, ProductID = :productid, add filter StartDate <= :date, add filter EndDate >= :date

  • query for active subscriptions for a given product on a given date: use ProductID-StartDate-index, query ProductID = :productid, StartDate <= :date, add filter EndDate >= :date

You can try all of this out fairly quickly in the AWS DynamoDB console:

enter image description here