17
votes

In the scenario where we have 1000 entries (unique keys) entering cosmos per minute, is it safe to use /id as the partition key?

In particular, there is the concept of Logical Partitions https://docs.microsoft.com/en-us/azure/cosmos-db/partition-data The graphic here scares me a little bit, showing that the logical partitions are actual entities (Ex. "city": "London"). If I have an 8 hour TTL and 1000 entries per minute, I don't necessarily want 480,000 logical partitions that cosmos needs to manage.

What I imagine happens is that the value of the partition key is simply hashed and modulo with the number of physical partitions, ex. https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview#choose-partitionkey indicates that this is true in the "Logical Partition Mangement" section. Furthermore, the "Choosing a Partition Key" section suggests (but does not actually state) that /id would be a fantastic partition key, as it doesn't have to worry about the 10GB limit, throughput limit, no hot spots, wide (huge) range of values, and since the application doesnt need to filter on anything except the id, cross partition queries wont be an issue for this use case.

In summary, do I need to worry about the memory/CPU/etc overhead of hundreds of thousands of partition key values (logical partitions)? The docs indicate the more values of the partition key is better, but don't say if its possible to have too many values.

2
While there's no "worry" per se, regarding storage when using /id as partition key (given that max logical partition size is 10GB), keep in mind that if you are searching for documents based on a property other than id, and id is partition key, you will be forced to do a cross-partition query (in other words, the query will be applied to every single logical partition). This is due to a query being scoped to a single partition. If you only retrieve documents by id then this isn't an issue. Just think about that when it comes to query performance.David Makogon
If you do query on anything other than id: Before committing to id as partition key, it might be worth benchmarking to see what your RU costs will be, when querying against a property other than id (you'll need to enable cross-partition query in the query options). You might find that a different partition key suits your query use-cases better.David Makogon
Currently yes, it will just be against the ID. Regarding your first comment, are you saying the cross partition query is ran against every logical partition (Huge number), as opposed to every physical partition (Just a few)? That is somewhat concerning.user2770791

2 Answers

20
votes

I am from the Cosmos DB engineering team.

You don't have to worry about the number of logical partition keys that are created on a Cosmos DB collection/container. As long as the partition key is an appropriate choice for your writes (subject to a per-logical partition key cap of 10GB) and queries, you should be good.

4
votes

Implications are:

  1. best cardinality
  2. easy&fast&cheap document reads

  3. no transactions as transaction scope is partition key

  4. queries by anything other than id will be cross-partition

PS. I can hardly imagine the case for not needing anything but by id reads/queries. except maybe for document caching (combined with TTL).