Cosmos DB partition key and query design for sequential access

Question

We would like to store a set of documents in Cosmos DB with a primary key of EventId. These records are evenly distributed across a number of customers. Clients need to access the latest records for a subset of customers as new documents are added. The documents are immutable, and need to be stored indefinitely.

How should we design our partition key and queries to avoid clients all hitting the same partitions and/or high RU usage?

If we use just CustomerId as the partition key, we would eventually run over the 10GB limit for a logical partition, and if we use EventId, then querying becomes inefficient (would result in a cross-partition query, and high RU usage, which we'd like to avoid).

Another idea would be to group documents into blocks. i.e. PartitionKey = int(EventId / PartitionSize). This would result in all clients hitting the latest partition(s), which presumably would result in poor performance and throttling.

If we use a combined PartitionKey of CustomerId and int(EventId / PartitionSize), then it's not clear to me how we would avoid a cross-partition query to retrieve the correct set of documents.

Edit:

Clarification of a couple of points:

Clients will access the events by specifying a list of CustomerId's, the last EventId they received, and a maximum number of records to retrieve.
For this reason, the use of EventId alone won't perform well, as it will result in a cross partition query (i.e. WHERE EventId > LastEventId).
The system will probably be writing on the order of 1GB a day, in 15 minute increments.
It's hard to know what the read volume will be, but I'd guess probably moderate, with maybe a few thousand clients polling the API at regular intervals.

Usually you cannot optimize the partitioning for both reads and writes. Either you make it good for reads or you make it good for writes. A common way to achieve both is to replicate the collection through the change feed to another collection with a different partitioning scheme (or even to a different kind of database). — juunas
It would help to provide some detail on the most common or performance critical queries you expect to run and expected volume. For example the best answer could be different if you expect 80/20 read/write vs 20/80. — Noah Stahl
@NoahStahl I've added some more details around expected volumes. — Erlend Powell

Hassan Raza Hassan Raza · Accepted Answer · 2020-09-19T08:07:56

So first thing first, logical partitions size limit has now been increased to 20GB, please see here.

You can use EventID as a partition as well, as you have limit of logical partition's size in GB but you have no limit on amount of logical partitions. So using EventID is fine, you will get a point to point read which is very fast if you query using the EventID. Now you mention using this way you will have to do cross-partition queries, can you explain how?

Few things to keep in mind though, Cosmos DB is not really meant for storing this kind of Log based data as it stores everything in SSDs so please calculate how much is your 1 document size and how many in a second would you have to store then how much in a day to how much in a month. You can use TTL to delete from Cosmos when done though and for long term storage store it in Azure BLOB Storage and for fast retrievals use Azure Search to query the data in BLOB by using CustomerID and EventID in your search query.

Cosmos DB partition key and query design for sequential access

3 Answers