1
votes

I'm investigating why we're exhausting so many RUs in Cosmos. Our writes are the expected amount of RUs but our reads are through the roof - a magnitude more than our writes. I tried to strip it to the simplest scenario. A single request querying on a partition with no results uses up 2000 RUs. Why is this so expensive?

var query = new QueryDefinition("SELECT * FROM c WHERE c.partitionKey = @partionKey ORDER BY c._ts ASC, c.id ASC")
                    .WithParameter("@partionKey", id.Value)

using var queryResultSetIterator = container.GetItemQueryIterator<MyType>(query,
                requestOptions: new QueryRequestOptions
                {
                    PartitionKey = new PartitionKey(id.Value.ToString()),
                });

while (queryResultSetIterator.HasMoreResults)
            {
                foreach (var response in await queryResultSetIterator.ReadNextAsync())
                {
                    yield return response.Data;
                }
            }

The partition key of the collection is /partitionKey. The RU capacity is set directly on the container, not shared. We have a composite index matching the where clause - _ts asc, id asc. Although I'm not sure how that would make any difference for returning no records.

Unfortunately the SDK doesn't appear to give you the spent RUs when querying this way so I've been using Azure monitor to observe RU usage.

Is anyone able to shed any light on why this query, returning zero records and limited to a single partition would take 2k RUs?

Update:

I just ran this query on another instance of the database in the same storage account. Both configured identically. DB1 has 0MB in it the, DB2 has 44MB in it. For the exact same operation involving no records returned, DB1 used 111 RUs, DB2 used 4730RUs - over 40 times more for the same no-result queries.

Adding some more detail: The consistency is set to consistent prefix. It's single region.

Another Update:

I've replicated the issue just querying via Azure Portal and it's related to the number of records in the container. Looking at the query stats it's as though it's loading every single document in the container to search on the partition key. Is the partition key not the most performant way to search? Doesn't Cosmos know exactly where to find documents belonging to a partition key by design?

2445.38 RUs
Showing Results
0 - 0
Retrieved document count: 65671 Retrieved document size: 294343656 bytes
Output document count: 0
Output document size: 147 bytes Index hit document count: 0
Index lookup time: 0 ms
Document load time: 8804.060000000001 ms
Query engine execution time: 133.11 ms
System function execution time: 0 ms
User defined function execution time: 0 ms
Document write time: 0 ms

1
Does it change if you drop the ORDER BY? Maybe it's doing work in preparation for sorting that it winds up not needing because of no records?Andy Lester
Just tried it and yes, surprisingly it does drop from 2000 RUs to 1500 RUs, which is interesting. It still seems exceptionally high though. I also noticed that 2 requests are now made for this zero result query. With the "order by" included 3 requests are made.MrPanucci
I don't know what significance that has, but I hope it's a clue. On stumpers like this I like to change little things here and there and see what changes, and it often turns up clues that lead to the answer. Maybe try using a literal value instead of @partionKey and see what happens? Change the select * to select 'some literal'? Good luck.Andy Lester
Which SDK version are you using?Matias Quaranta
@MatiasQuaranta v3 dotnetMrPanucci

1 Answers

0
votes

I eventually got to the bottom of the issue. In order to search on the partition key it needs to be indexed. Which strikes me as odd considering the partition key is used to decide where a document is stored, so you'd think Cosmos would inherently know the location of every partition key.

Including the partition key in the list of indexed items solved my problem. It also explains why performance degraded over time as the database grew in size - it was scanning through every single document.