1
votes

We are evaluating Azure Cosmos DB for a MongoDB replacement. We have a huge collection of 5 million documents and each document is about 20 KB in size. The total size of the collection in Mongo is around 50 GB and we expect it to be 15% more in Cosmos because of JSON size. Also, there is an early increase of 1.6 million documents. Our throughput requirement is around 10000 queries per second. The queries can be for a single document, group of documents. Query for a single document takes around 5 RU and multiple documents around 10 to 20 RU.  To get the required throughput, we need to partition the collection. 

Would like to get answers for the below questions?

  1. How many physical partitions are used by Cosmos DB internally? The portal metrics shows only 10 Partitions. Is this always the case?
  2. What is the maximum size of each physical partition? Portal metrics say it as 10 GB. How can we store more than 100 GB of data?
  3. What is the maximum RU per partition? Do we get throttled, when a single partition becomes very hot to query?

These are the starting hurdles we wanted to overcome, before we can actually proceed doing further headway into Cosmos DB adoption. 

1

1 Answers

3
votes
  1. The number of physical partitions is managed by the Cosmos service. Generally you start out with 10 but if more are required the system will add them for you transparently.

  2. The maximum size of a physical partition shouldn't be a concern of your application. When you create a partitioned collection you are dealing with "logical partitions" not physical ones. Cosmos will ensure that all documents that are part of a logical partition (have the same partition key) will always be placed together on one of the physical partitions. However as indicated in part 1 Cosmos will take care of ensuring that you have an appropriate number of physical partitions to store your data. Put another way, any given physical partition will be home to many logical partitions and these can be load balanced and moved around as needed.

  3. Maximum RU per physical partition is your total RU/s divided by the number of physical partitions. So if you have a 10000 RU collection with 10 physical partitions you're actually limited to 1000 RU per physical partition. For this reason it is important to pick appropriate logical partition keys for your documents. If you create hot spots you can be throttled below your total provisioned RUs.

I recommend that you spend some time reading about partitioning and scale with Cosmos. The documentation and video available on this page are quite helpful. Here is some additional information copied directly from that page:

  • You provision a Cosmos DB container with T requests/s throughput
  • Behind the scenes, Cosmos DB provisions partitions needed to serve T requests/s. If T is higher than the maximum throughput per partition t, then Cosmos DB provisions N = T/t partitions
  • Cosmos DB allocates the key space of partition key hashes evenly across the N partitions. So, each partition (physical partition) hosts 1-N partition key values (logical partitions)
  • When a physical partition p reaches its storage limit, Cosmos DB seamlessly splits p into two new partitions p1 and p2 and distributes values corresponding to roughly half the keys to each of the partitions. This split operation is invisible to your application.
  • Similarly, when you provision throughput higher than t*N throughput, Cosmos DB splits one or more of your partitions to support the higher throughput