1
votes

I am bit new to Azure Cosmos DB and trying to understand the concepts.

I want help to decide the the best possible partition key for DocumentDB collection. Please refer image below which have possible partitions using different partition keys.

this

As mentioned in the blog post here,

An ideal partition key is one that appears frequently as a filter in your queries and has sufficient cardinality to ensure your solution is scalable.

From above line, I think, in my case, UserId can be used as partition key.

Can someone please suggest me which key is the best possible candidate for partition key?

2
I am not sure why someone has down voted my question. I am doing and have done some research to find answer already. I have posted question here so someone who is working on DocumentDB and have clear understanding can suggest me.Ganesh
It is also important how you access your data. As a very first step: If you don't read your data very often, use a key that has as many values as possible. If you read parts of your data regularly, try to keep the results of your queries within as few partitions as possible. Balance both extremes and you will have a good idea on how to start. If your queries are filtered by userId, this would be a good candidate.Alex AIT

2 Answers

0
votes

From the 10 things to know about DocumentDB Partitioned Collections and micro official document , you could find lots of very good advice about choice of partitioning key, so I'm not going to repeat here.

The selection of partitioning keys depends on the data stored in the database and the frequent query filtering criteria.

It is often advised to partition on something like userid which is good if you have. Suppose your business logic has many queries for a given userid and want to look up no more than a few hundred entries. In such cases the data can be quickly extracted from a single partition without the overhead of having to collate data across partitions.

However, if you have millions of records for the user then partitioning on userid is perhaps the worst option as extracting large volumes of data from a single partition will soon exceed the overhead of collation. In such cases you want to distribute user data as evenly as possible over all partitions. You may need to find another column to be the partition key.

So , if the data volume is very large, I suggest that you do some simple tests based on your business logic and choose the best partitioning key for your performance. After all, the partitioning key cannot be changed once it is set up.

Hope it helps you.

-2
votes

It depends, but here are few things to consider:

The blog post you mentioned say:

Additionally, the storage size for documents belonging to the same partition key is limited to 10GB. An ideal partition key is one that appears frequently as a filter in your queries and has sufficient cardinality to ensure your solution is scalable.

Also, I really recommend to check this post and video, https://docs.microsoft.com/en-us/azure/cosmos-db/partition-data,

The choice of the partition key is an important decision that you have to make at design time. You must pick a property name that has a wide range of values and has even access patterns.

So make sure to choose a partition Key that has many values and meets those requirements.