We are evaluating Azure Cosmos DB for a MongoDB replacement. We have a huge collection of 5 million documents and each document is about 20 KB in size. The total size of the collection in Mongo is around 50 GB and we expect it to be 15% more in Cosmos because of JSON size. Also, there is an early increase of 1.6 million documents. Our throughput requirement is around 10000 queries per second. The queries can be for a single document, group of documents. Query for a single document takes around 5 RU and multiple documents around 10 to 20 RU. To get the required throughput, we need to partition the collection.
Would like to get answers for the below questions?
- How many physical partitions are used by Cosmos DB internally? The portal metrics shows only 10 Partitions. Is this always the case?
- What is the maximum size of each physical partition? Portal metrics say it as 10 GB. How can we store more than 100 GB of data?
- What is the maximum RU per partition? Do we get throttled, when a single partition becomes very hot to query?
These are the starting hurdles we wanted to overcome, before we can actually proceed doing further headway into Cosmos DB adoption.