How to find effectiveness of partition key in documentDb?

Question

To get the optimal performance in documentDb, it is essential that we chose the right key for a partition key. Lets say we do pick a key as the partition key (before we have any data and with a bit of future thinking). As is the case with the data, once the data is accumulated in the documentDb, our partition key may or may not be optimal despite our best intentions.

Is there any logic built into documentDb for us to see clearly whether the current partition key is optimal (i.e. the data is truly distributed across all partitions)? What strategies are available to developers to see this information in crystal clear and unambiguous manner?

I would say your query patterns matter just as much. If you cannot predict the partition key you will have to query each collection, effectively working against the whole partitioning scheme. I will be surprised if that leaves you with more than one or two logical key schemes, depending on your data. — hsulriksen
Can you please explain what you mean by logical key schemes? — Raghu
Part of it is covered by the tutorial mentioned by Bruce. Another way to look at it is, how will you query the data? If your query is triggered from let's say an API, can you from the API request determine the partition key in order to avoid querying all partitions? — hsulriksen

Bruce Chen Bruce Chen · Accepted Answer · 2017-03-02T09:49:53

As mentioned in this document about Partition Keys:

The choice of the partition key is an important decision that you’ll have to make at design time. You must pick a JSON property name that has a wide range of values and is likely to have evenly distributed access patterns.

It is a best practice to have a partition key with a large number of distinct values (100s-1000s at a minimum).

Here are a few examples about how to choose the appropriate partition key for your application:

If you’re implementing a user profile backend, then the user ID is a good choice for partition key.
If you’re using DocumentDB for logging time-series data, then the hostname or process ID is a good choice for partition key.

For more details, you could refer to this tutorial about designing for partitioning.

Is there any logic built into documentDb for us to see clearly whether the current partition key is optimal (i.e. the data is truly distributed across all partitions)

Based on your requirement, I assume that you could implement performance test for your DocumentDB workloads and evaluate that whether your current DocumentDB is ready for high-performance scenarios. For more details, you could follow this official tutorial for performance and scale testing with Azure DocumentDB.

How to find effectiveness of partition key in documentDb?

1 Answers