0
votes

We would like to store a set of documents in Cosmos DB with a primary key of EventId. These records are evenly distributed across a number of customers. Clients need to access the latest records for a subset of customers as new documents are added. The documents are immutable, and need to be stored indefinitely.

How should we design our partition key and queries to avoid clients all hitting the same partitions and/or high RU usage?

If we use just CustomerId as the partition key, we would eventually run over the 10GB limit for a logical partition, and if we use EventId, then querying becomes inefficient (would result in a cross-partition query, and high RU usage, which we'd like to avoid).

Another idea would be to group documents into blocks. i.e. PartitionKey = int(EventId / PartitionSize). This would result in all clients hitting the latest partition(s), which presumably would result in poor performance and throttling.

If we use a combined PartitionKey of CustomerId and int(EventId / PartitionSize), then it's not clear to me how we would avoid a cross-partition query to retrieve the correct set of documents.

Edit:

Clarification of a couple of points:

  • Clients will access the events by specifying a list of CustomerId's, the last EventId they received, and a maximum number of records to retrieve.
  • For this reason, the use of EventId alone won't perform well, as it will result in a cross partition query (i.e. WHERE EventId > LastEventId).
  • The system will probably be writing on the order of 1GB a day, in 15 minute increments.
  • It's hard to know what the read volume will be, but I'd guess probably moderate, with maybe a few thousand clients polling the API at regular intervals.
3
Usually you cannot optimize the partitioning for both reads and writes. Either you make it good for reads or you make it good for writes. A common way to achieve both is to replicate the collection through the change feed to another collection with a different partitioning scheme (or even to a different kind of database).juunas
It would help to provide some detail on the most common or performance critical queries you expect to run and expected volume. For example the best answer could be different if you expect 80/20 read/write vs 20/80.Noah Stahl
@NoahStahl I've added some more details around expected volumes.Erlend Powell

3 Answers

0
votes

So first thing first, logical partitions size limit has now been increased to 20GB, please see here.

You can use EventID as a partition as well, as you have limit of logical partition's size in GB but you have no limit on amount of logical partitions. So using EventID is fine, you will get a point to point read which is very fast if you query using the EventID. Now you mention using this way you will have to do cross-partition queries, can you explain how?

Few things to keep in mind though, Cosmos DB is not really meant for storing this kind of Log based data as it stores everything in SSDs so please calculate how much is your 1 document size and how many in a second would you have to store then how much in a day to how much in a month. You can use TTL to delete from Cosmos when done though and for long term storage store it in Azure BLOB Storage and for fast retrievals use Azure Search to query the data in BLOB by using CustomerID and EventID in your search query.

0
votes

How should we design our partition key and queries to avoid clients all hitting the same partitions and/or high RU usage?

I faced a similar issue some time back and a PartitionKey with customerId + datekey e.g. cust1_20200920 worked well for me.

I created the date key as 20200920 (YYYYMMDD), but you can choose to ignore the date part or even the month (cust1_202009 /cust1_2020), based on your query requirement.

Also, IMO, if there are multiple known PartitionKeys at a query time it's kind of a good thing. For example, if you keep YYYYMM as the PartitionKey and want to get data for 4 months, you can run 4 queries in parallel and combine the data. Which is faster if you have many clients and these Partition Keys are distributed among multiple physical partitions.

On a separate note, Cosmos Db has recently introduced an analytical store for the transactional data which can be useful for your use case. More about it here - https://docs.microsoft.com/en-us/azure/cosmos-db/analytical-store-introduction

0
votes

One approach is using multiple Cosmos containers as "hot/cold" tiers with different partitioning. We could use two containers:

  • Recent: all writes and all queries for recent items go here. Partitioned by CustomerId.
  • Archive: all items are copied here for long term storage and access. Partitioned by CustomerId + timespan (e.g. partition per calendar month)

The Recent container would provide single partition queries by customer. Data growth per partition would be limited either by setting reasonable TTL during creation, or using a separate maintenance job (perhaps Azure Function on timer) to delete items when they are no longer candidates for recent-item queries.

A Change Feed processor, implemented by an Azure Function or otherwise, would trigger on each creation in Recent and make a copy into Archive. This copy would have partition key combining the customer ID and date range as appropriate to limit the partition size.

This scheme should provide efficient recent-item queries from Recent and safe long-term storage in Archive, with reasonable Archive query efficiency given a desired date range. The main downside is two writes for each item (one for each container) -- but that's the tradeoff for efficient polling. Whether this tradeoff is worthwhile is probably best determined by simulating the load and observing performance.