1
votes

We are using Cosmos DB SQL API and here's a collection XYZ with:

Size: Unlimited
Throughput: 50000 RU/s
PartitionKey: Hashed

We are inserting 200,000 records each of size ~2.1 KB and having same value for a partition key column. Per our knowledge all the docs with same partition key value are stored in the same logical partition, and a logical partition should not exceed 10 GB limit whether we are on fixed or unlimited sized collection.

Clearly our total data is not even 0.5 GB. However, in the metrics blade of Azure Cosmos DB (in portal), it says:

Collection XYZ has 5 partition key ranges. Provisioned throughput is evenly distributed across these partitions (10000 RU/s per partition).

This does not match with what we have studied so far from the MSFT docs. Are we missing something? Why are these 5 partitions created?

Azure Cosmos DB Metrics

2

2 Answers

4
votes

When using the Unlimited collection size, by default you will be provisioned 5 physical partition key ranges. This number can change, but as of May 2018, 5 is the default. You can think of each physical partition as a "server". So your data will be spread amongst 5 physical "servers". As your data size grows, your data will automatically be re-distributed against more physical partitions. That's why getting partition key correct upfront in your design is so important.

The problem in your scenario of having the same Partition Key (PK) for all 200k records is that you will have hot spots. You have 5 physical "servers" but only one will ever be used. The other 4 will go idle, and the result is that you'll have less performance for the same price point. You're paying for 50k RU/s but will ever only be able to use 10k RU/s. Change your PK to something that is more uniformly distributed. This will vary of course how you read the data. If you give more detail about the docs you're storing then we may be able to help give a recommendation. If you're simply doing point lookups (calling ReadDocumentAsync() by each Document ID) then you can safely partition on the ID field of the document. This will spread all 200k of your docs across all 5 physical partitions and your 50k RU/s throughput will be maximized. Once you effectively do this, you will probably see that you can reduce the RU usage to something much lower and save a ton of money. With only 200k records each at 2.1KB, you probably could go low as 2500 RU/s (1/20th of the cost you're paying now).

*Server is in quotes because each physical partition is actually a collection of many servers that are load-balanced for high availability and also throughput (depending on your consistency level).

3
votes

From "How does partitioning work":

In brief, here's how partitioning works in Azure Cosmos DB:

  • You provision a set of Azure Cosmos DB containers with T RU/s (requests per second) throughput.
  • Behind the scenes, Azure Cosmos DB provisions physical partitions needed to serve T requests per second. If T is higher than the maximum throughput per physical partition t, then Azure Cosmos DB provisions N = T/t physical partitions. The value of maximum throughput per partition(t) is configured by Azure Cosmos DB, this value is assigned based on total provisioned throughput and the hardware configuration used.

.. and more importantly:

When you provision throughput higher than t*N, Azure Cosmos DB splits one or more of your physical partitions to support the higher throughput.

So, it seems your requested RU throughput of 50k is higher than that t mentioned above. Considering the numbers, it seems t is ~10k RU/s.

Regarding the actual value of t, CosmosDB team member Aravind Krishna R. has said in another SO post:

[---] the reason this value is not explicitly mentioned is because it will be changed (increased) as the Azure Cosmos DB team changes hardware, or rolls out hardware upgrades. The intent is to show that there is always a limit per partition (machine), and that partition keys will be distributed across these partitions.

You can discover the current value by saturating the writes for a single partition key at maximum throughput.