1
votes

I have a question similar to this one. Basically, I have been testing different ways to use partition key, and have noticed that at any time, the more a partition key is referenced in a query, the higher the RUs. It is quite consistent, and doesn't even matter how the partition key is used. So I narrowed it down to the basic queries for test.

To start, this database has about 850K documents, all more than 1KB in size. The partition key is basically a 100 modulus of the id in number form, is set to /partitionKey, and the container uses a default indexing policy:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*"
        }
    ],
    "excludedPaths": [
        {
            "path": "/\"_etag\"/?"
        }
    ]
}

Here is my basic query test:

SELECT c.id, c.partitionKey
FROM c
WHERE c.partitionKey = 99 AND c.id = '99999'
-- Yields One Document; Actual Request Charge: 2.95 RUs
SELECT c.id, c.partitionKey
FROM c
WHERE c.id = '99999'
-- Yields One Document; Actual Request Charge: 2.85 RUs

Azure Cosmos documentation says without the partition key, the query will "fan out" to all logical partitions. Therefore I would fully expect the first query to target a single partition and the second to target all of them, meaning the first one should have a lower RUs. I suppose I am using RU results as evidence to whether or not the Cosmos is fanning out and scanning each partition, and comparing it to what the documentation says should happen.

I know these results are just 0.1 RUs in difference. But my point is the more complex the query, the bigger the difference. For example, here is another query ever so slightly more complex:

SELECT c.id, c.partitionKey
FROM c
WHERE (c.partitionKey = 98 OR c.partitionKey = 99) AND c.id = '99999'
-- Yields One Document; Actual Request Charge: 3.05 RUs

Notice the RUs continues to grow and separate from having not specified a partition key at all. Instead I would expect the above query to only target two partitions, compared to no partition key check which supposedly fans out to all partitions.

I am starting to suspect the partition key check is happening after the other filters (or inside each partition scan). For example, going back to the first query but changing the id to something which does not exist:

SELECT c.id, c.partitionKey
FROM c
WHERE c.partitionKey = 99 AND c.id = '99999x'
-- Yields Zero Documents; Actual Request Charge: 2.79 RUs
SELECT c.id, c.partitionKey
FROM c
WHERE c.id = '99999x'
-- Yields Zero Documents; Actual Request Charge: 2.79 RUs

Notice the RUs are exactly the same, and both (including the one with the partition filter) have less RUs than when a document exists. This seems like it would be a symptom of the partition filter being executed on the results, not restricting a fan-out. But this is not what the documentation says.

Why does Cosmos have higher RUs when a partition key is specified?

1
I suspect this is because the usual way to specify the partition key for a query is alongside the query as a header or parameter, not as part of the query WHERE clause. What do you see for the query metrics? docs.microsoft.com/en-us/azure/cosmos-db/…Noah Stahl
It also might be a side effect of providing the id in the query. Curious what happens when the other filter is something other than id? Also, your data might actually be on a single physical partition if less than 50 GB docs.microsoft.com/en-us/azure/cosmos-db/…Noah Stahl

1 Answers

3
votes

like the comment specifies if you are testing via the portal (or via the code, but with the query you provided) it will become more expensive, because you are not querying a specific partition, but rather querying everything and then introducing another filter, which is more expense.

what you should do instead - is use the proper way in the code to pass in the partition key. my result were quite impressive: 3 ru\s with the PK and 20.000 ru\s without the PK, so I'm quite confident intworks (I've had a really large dataset)