0
votes

I have enabled diagnostic logging of a Cosmos Account (SQL interface). The diagnostic log data is being sent to a storage account - and I can see that there is a new DataPlaneRequests blob created every 5 minutes. So far, so good.

I'm performing CRUD requests against a collection in the Cosmos account. I can see entries within the DataPlaneRequest logs that look like this ('*' used to protect the innocent)...

{ "time": "2020-01-28T03:04:59.2606375Z", "resourceId": "/SUBSCRIPTIONS/****/RESOURCEGROUPS/****/PROVIDERS/MICROSOFT.DOCUMENTDB/DATABASEACCOUNTS/**********", "category": "DataPlaneRequests", "operationName": "Query", "properties": {"activityId": "38f497ee-7e37-435f-8b4a-a2f0d8d65d12","requestResourceType": "DocumentFeed","requestResourceId": "/dbs/****/colls/****/docs","collectionRid": "","databaseRid": "","statusCode": "200","duration": "4.588500","userAgent": "Windows/10.0.14393 documentdb-netcore-sdk/2.8.1","clientIpAddress": "52...***","requestCharge": "4.160000","requestLength": "278","responseLength": "5727","resourceTokenUserRid": "","region": "West US 2","partitionId": ""}}

Every entry in the DataPlaneRequests log has an empty partitionId property value. (The operationName property value in the log is either "Create" or "Query").

So my question is - why is this property empty?

Here is the documentation for DataPlaneRequests


What I'm actually trying to accomplish, is to obtain information about the load being placed on the physical partitions of a collection. e.g. I'd like to know that during the past 10 minutes, 10k Create operations were performed in physical-partition "1", while 55k operations were performed in physical-partition "3". That will allow me to have much more insight into why a collection is experiencing throttling, etc.

1
Why do you want to know what physical partition the data was written or queried from? Have you looked at the metrics tab in the Cosmos blade in the portal? There is a throughput tab at the top that can give you details on hot partitions.Mark Brown
...because when throttling occurs, I'd like to know who's responsible for the load being placed on each physical partition. Because I'd like to know who to blame (from both a PartitionKey-value and a UserAgent perspective) and also, which PartitionKey-values (tenants) may be affected by the throttling.user1793093
An hypothetical example: Microservice 'A' is making a lot of requests, that all map to the same physical partition '1' - so the throttling in partition '1' can be blamed on 'A'. Microservice 'B' is making requests, that map to all of the physical partitions - so the throttling in partition '1' (caused by 'A' will be affecting a subset of the requests performed by 'B'). So I'd like to be able to blame 'A' for the throttling in '1' and I'd like to be able to inform 'B' which subset of its requests may experience degraded performance.user1793093
I think the way to approach this is to find out which operations are taking the most RU/s. Take a look at this article here. The first two queries I think will give you the data you're looking for. docs.microsoft.com/en-us/azure/cosmos-db/… Hope this helps.Mark Brown
Thanks for the input @MarkBrown - I've raised a support ticket with Microsoft. I'll post whatever resolution comes of that.user1793093

1 Answers

0
votes

When you connect to Cosmos, there are two connection modes available: Gateway and Direct. It turns out that only Direct mode, causes the partitionId to be included in the logs. (If you read up about how these two modes work (differently), then that makes sense).

Anyway, it turns out that the partitionId in the logs is not a reference to a physical partition of a collection. So I'm unable to use that data to solve the problem, I was attempting to solve.

There is a physical partition id available in the logs - but it's also of limited use - since it's only tracked for the 3 largest (logical) PartitionKey values, of each physical partition, only if the key-value contains >=1Gb of documents.