1
votes

I am using Azure Cosmos to store customer data in my multi-tenant app. One of my customers started complaining about long wait times when querying their data. As a quick fix, I created a dedicated Cosmos instance for them and copied their data to that dedicated instance. Now I have two databases that contain exact copies of this customer's data. We'll call these databases db1 and db2. db1 contains all data for all customers, including this customer in question. db2 contains only data for this customer in question. Also, for both databases the partition key is an aggregate of tenant id and date, called ownerTime. Also, each database contains a single container named "call".

I then run this query in both databases:

select c.id,c.callTime,c.direction,c.action,c.result,c.duration,c.hasR,c.hasV,c.callersIndexed,c.callers,c.files,c.tags_s,c.ownerTime
from c
where
c.ownerTime = '352897067_202011'
and c.callTime>='2020-11-01T00:00:00'
and c.callTime<='2020-11-30T59:59:59'
and (CONTAINS(c.phoneNums_s, '7941521523'))

As you can see, I am isolating one partition (ownerTime: 352897067_202011). In this partition, there are about 50,000 records in each database.

In db1 (the database with all customer data), this uses 5116.38 RUs. In db2 (the dedicated instance), this query uses 65.8 RUs.

Why is there this discrepancy? The data in these two partitions is exactly the same across the two databases. The indexing policy is exactly the same as well. I suspect that db1 is trying to do a fan-out query. But why would it do that? I have the query set up so that it will only look in this one partition.

Here are the stats I retrieved for the above query after running on each database:

db1

Request Charge:                        5116.38 RUs
Retrieved document count:              8
Retrieved document size:               18168 bytes
Output document count:                 7
Output document size:                  11793 bytes
Index hit document count:              7
Index lookup time:                     5521.42 ms
Document load time:                    7.8100000000000005 ms
Query engine execution time:           0.23 ms
System function execution time:        0.01 ms
User defined function execution time:  0 ms
Document write time:                   0.07 ms
Round Trips:                           1

db2

Request Charge:                        65.8 RUs
Showing Results:                       1 - 7
Retrieved document count:              7
Retrieved document size:               16585 bytes
Output document count:                 7
Output document size:                  11744 bytes
Index hit document count:              7
Index lookup time:                     20.720000000000002 ms
Document load time:                    4.8099 ms
Query engine execution time:           0.2001 ms
System function execution time:        0.01 ms
User defined function execution time:  0 ms
Document write time:                   0.05 ms
Round Trips:                           1

The indexing policy for both databases is:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*"
        }
    ],
    "excludedPaths": [
        {
            "path": "/\"_etag\"/?"
        },
        {
            "path": "/callers/*"
        },
        {
            "path": "/files/*"
        }
    ]
}

*Update: I recently updated this question with a clearer query, and the query stats returned.

1
Using array_contains on your partition key value seems odd to me here. These need to be equality predicates to work. Use Where = and OR the two pk vales. Also, why are you using contains on single values. Also use = on these as well. The cost is different because db1 has to scan across a ton more data and partitions than db2. Also callTime should be in a composite index.Mark Brown
The only documentation regarding how the partition should be placed is that the "where clause should contain the partition key". I would think that "array_contains" should work. Also, the text filters (contains) are partial values.Matt Spinks
To isolate the issue more, I have simplified the query and posted the RUs for the new simplified query.Matt Spinks
should be the same. There must be something different here. A way to check where RU is being spent is to take a look at query metrics. Here's a doc that shows how to get these in the response object. docs.microsoft.com/en-us/azure/cosmos-db/sql-api-query-metricsMark Brown
Thanks @MarkBrown. I just added the query stats (and updated the query itself to be clearer).Matt Spinks

1 Answers

0
votes

Following up from comments.

Since db1 has all customer’s data, the physical partition will have lot more unique values for callTime so the number of index pages scanned to evaluate callTime will be high. In case of db2, since only 1 customer data is there the logical and physical partition will be the same. So while this is not a fan-out, the query engine will still need to evaluate the range filter on callTime for all other other customer data.

To fix/improve the performance on db1 you should create a composite index on /ownerTime and /callTime, see below.

"compositeIndexes":[
         [
            {
               "path":"/ownerTime"
            },
            {
               "path":"/callTime"
            }
         ]
    ],

Thanks.