0
votes

I am using Cosmos DB for my application that I am developing for a client. The client is a multi-national with around 175,000 employees worldwide. The application has to consolidate notifications from various source systems and display them on a user's portal when the user clicks the drop-down for their name same as any of the online systems, e.g., LinkedIn etc.

I am trying to determine the 'Partition Key' for Cosmos DB. I believe that this is the "User Id". I want to make sure that I have the benefit of people with more experience in Cosmos DB database design.

Here is my reasoning for selecting the User Id as Partition Key. I know that potentially there can be around 175,000 'Logical' partitions, they will be mapped to far fewer 'Physical' partitions on the underlying Azure Data Storage Platform. The choice of 'User Id' guarantees that ALL of the current User's notification records are stored on the same 'Logical' and therefore 'Physical' partition.

Am I wrong? Please confirm. If yes, then what is a better strategy?

Thanks.

Bharat

1
If it fits your queries it shouldn't be an issue to have seperate partitions for each entry. That implies that each time you read & write data you know exact what id you are dealing with. - 404

1 Answers

0
votes

General rule of thumb is you want to look at your high-volume operations and put load tests on your partition strategy to validate it will scale. Don't need to be exhaustive. Pareto principle applies here.

Based upon the query you reference though your logic appears sound. Additional things to keep an eye on is storage. Max partition size is 20GB. I have no idea what the payload size is for each notification but something to measure and work through given X number of notifications, how long until you reach that 20GB size. Additionally, you'll likely want to avoid having to query an ever growing partition at some point in the future so may want to look at a TTL strategy in the future as well.

Hope this helps.