15
votes

Our Azure Cosmos DB collection has gotten large enough to require a partition key. In doing some reading about this, I get the impression that the best partition key is one that provides for even distribution and higher cardinality. This article from Microsoft discusses it.

Using a primary key as a partition key provides for even distribution, but a cardinality of only 1. If this is my only option, is this a bad thing? The aforementioned article gives a few examples and seems to indicate that the primary key should be used as a partition key in those instances. In the case of Azure Cosmos DB, the partitions are logical, not physical. So it wouldn't lead to having each document on its own disk, but it seems like it could lead to a bloated index.

Is using a primary key as a partition key a common practice? Are there any downsides to it?

3

3 Answers

6
votes

Actually , the choice of partition key is a question that deserves to be weighed repeatedly. Since choosing primary key to be the partition key is your only option, I just discuss some of the possible negative things as your references.

In terms of performance, if your query's field is not partition key, your query will definitely reduce query performance by crossing partitions. Arguably, if the amount of data is small, it won't have much effect.

In terms of cost, cosmos db is charged primarily by storage space and RUs consumption.As you said, choosing primary key as partition key will lead more indexes storage. If mostly queries are cross-partition, it also leads more RUs consumption.

In terms of using of stored procedure, triggers or UDF, you can't use cross-partition transactions via stored procedures and triggers. Because then are partitioned so that you need to specify the partition key(cardinality is only 1) when you use them.

Just note that if partition key is created, it cannot be deleted or modified later. So consider it before you choose and make sure you do the data backup.

More details, still refer to the official doc.

4
votes

No, there is no downside to it. Strive to have partition key with high cardinality. Don't worry about indexes or physical partitions etc.

You can have million of partition keys and 10 physical partitions. Physical partitions are created behind the scene by CosmosDB. You should never worry about physical partitions.

2
votes

You could say that the primary key is the safest and probably, most appropriate choice for a partition key.

It guarantees uniqueness of the value, which other than unique keys, is the only way to achieve. The distribution will be even and because the primary key will be your partition key, you will be able to use it in order to retrieve the document by reading it, instead of querying, which reduces operation speed and cost.