2
votes

I am using Azure Cosmos DB SQL API to try to achieve the following;

We have device data stored within a collection and would love to retrieve the latest event data per device serial effectively without having to do N queries for each device separately.

SELECT * 
FROM c
WHERE c.serial IN ('V55555555','synap-aim-g1') ORDER BY c.EventEnqueuedUtcTime DESC

Im assuming I would need to use Group By - https://docs.microsoft.com/en-us/azure/cosmos-db/sql-query-group-by

Any assistance would be greatly appreciated

Rough example of data :

[
    {
        "temperature": 25.22063251827873,
        "humidity": 71.54208429695204,
        "serial": "V55555555",
        "testid": 1,
        "location": {
            "type": "Point",
            "coordinates": [
                30.843687,
                -29.789895
            ]
        },
        "EventProcessedUtcTime": "2020-09-07T12:04:34.5861918Z",
        "PartitionId": 0,
        "EventEnqueuedUtcTime": "2020-09-07T12:04:34.4700000Z",
        "IoTHub": {
            "MessageId": null,
            "CorrelationId": null,
            "ConnectionDeviceId": "V55555555",
            "ConnectionDeviceGenerationId": "637323979596346475",
            "EnqueuedTime": "2020-09-07T12:04:34.0000000"
        },
        "Name": "admin",
        "id": "6dac491e-1f28-450d-bf97-3a15a0efaad8",
        "_rid": "i2UhAI7ofAo3AQAAAAAAAA==",
        "_self": "dbs/i2UhAA==/colls/i2UhAI7ofAo=/docs/i2UhAI7ofAo3AQAAAAAAAA==/",
        "_etag": "\"430131c1-0000-0100-0000-5f5621d80000\"",
        "_attachments": "attachments/",
        "_ts": 1599480280
    }
]

UPDATE: So doing the following returns the correct data but sadly you can only return data thats inside your group by or an aggregate function (i.e. cant do select *)

SELECT c.serial, MAX(c.EventProcessedUtcTime)
FROM c
WHERE c.serial IN ('V55555555','synap-aim-g1') 
GROUP BY c.serial

[
    {
        "serial": "synap-aim-g1",
        "$1": "2020-09-09T06:29:42.6812629Z"
    },
    {
        "serial": "V55555555",
        "$1": "2020-09-07T12:04:34.5861918Z"
    }
]
2
As per 'latest event data per device serial' you mean to say you can have multiple documents lets say for 'V55555555' with different 'EventProcessedUtcTime' and we need to select the latest one of it, and this has to be done for every device?AnuragSharma-MSFT
That is correct @AnuragSharma-MSFT we will have infinite documents per device serial (partition). I would like to get the latest document (by IoT Hub time as you stated correctly) efficientlyDavid
I am afraid there is no direct way to achieve it using a query in cosmos db. However you can refer to below link for the same topic. If you are using any sdk, this would help in achieving the desired functionality: docs.microsoft.com/en-us/answers/questions/38454/index.htmlAnuragSharma-MSFT
Ok cool, thought so. Thanks!David
Hi @David, really thanks for AnuragSharma-MSFT's help! We're glad that you resolved it. The update should be posted as the answer, I just help you post it. You can accept it as answer. This can be beneficial to other community members. Thank you.Leon Yue

2 Answers

1
votes

If the question is really about an efficient approach to this particular query scenario, we can consider denormalization in cases where the query language itself doesn't offer an efficient solution. This guide on partitioning and modeling has a relevant section on getting the latest items in a feed.

We just need to get the 100 most recent posts, without the need to paginate through the entire data set.

So to optimize this last request, we introduce a third container to our design, entirely dedicated to serving this request. We denormalize our posts to that new feed container.

Following this approach, you could create a "Feed" or "LatestEvent" container dedicated to the "latest" query which uses the device serial as id and having a single partition key in order to guarantee that there is only one (the most recent) event item per device, and that it can be fetched by the device serial or listed with least possible cost using a simple query:

SELECT *
FROM c
WHERE c.serial IN ('V55555555','synap-aim-g1')

The change feed could be used to upsert the latest event, such that the latest event is created/overwritten in the "LatestEvent" container as its source item is created in the main.

2
votes

Thanks for @AnuragSharma-MSFT's help:

We're glad that you resolved it in this way, thanks for sharing the update:

So doing the following returns the correct data but sadly you can only return data thats inside your group by or an aggregate function (i.e. cant do select *)

SELECT c.serial, MAX(c.EventProcessedUtcTime)
FROM c
WHERE c.serial IN ('V55555555','synap-aim-g1') 
GROUP BY c.serial

[
    {
        "serial": "synap-aim-g1",
        "$1": "2020-09-09T06:29:42.6812629Z"
    },
    {
        "serial": "V55555555",
        "$1": "2020-09-07T12:04:34.5861918Z"
    }
]