0
votes

I am having a cosmos collection having somewhere aroung 28000 documents and i am using CreateDocumentQuery on DocumentClient with a where condition on properties of type 'T'. With different types of usage mentioned below i am getting very drastic difference of time latency in getting results.

Case 1:

    var docs2 = 
    _documentClient.CreateDocumentQuery<HeartRateDayRecordIdentifierData>(collectionUri).Where(x =>
           x.SubjectDeviceInformation.StudyId == "TestStudy"
           && x.SubjectDeviceInformation.SiteId == "Site_._Street_23"
           && x.SubjectDeviceInformation.SubjectId == "Subject3"
           && x.SubjectDeviceInformation.DeviceId == "Device1"
           && x.DaySplit == "20181112").AsEnumerable().FirstOrDefault(); 

Case 2: It is the same code and condition but this time, i am using function variable to decalre the where condition.

Func<HeartRateDayRecordIdentifierData, bool> searchOptions = x =>
        x.SubjectDeviceInformation.StudyId == "TestStudy"
        && x.SubjectDeviceInformation.SiteId == "Site_._Street_23"
        && x.SubjectDeviceInformation.SubjectId == "Subject3"
        && x.SubjectDeviceInformation.DeviceId == "Device1"
        && x.DaySplit == "20181112";

var docs1 = _documentClient.CreateDocumentQuery<HeartRateDayRecordIdentifierData>(collectionUri)
                        .Where(searchOptions).AsEnumerable().FirstOrDefault();

Case 1 which is having inline where condition is returning the results in timespan of less than a second where as in Case 2 the result is taking around 20-30 seconds which seems a bit odd. I don't understand what's the difference between having an inline where condition and passing where condition as varaible.

If anybody interestd in sample cosmos document:

{
    "id": "TestStudy_Site_._Street_21_Subject1_Device1_20181217",
    "AssemblyVersion": "1.2.3.0",
    "DataItemId": "20181217/TestStudy_Site_._Street_21_Subject1_Device1_20181217",
    "MessageType": "HeartRateDayDocumentIdentifier",
    "TimeStamp": "2018-12-14T00:00:00",
    "DaySplit": "20181217",
    "SubjectDeviceInformation": {
        "SubjectId": "Subject1",
        "DeviceId": "Device1",
        "StudyId": "TestStudy",
        "SiteId": "Site_._Street_21"
    }   
}

and Here is the model used to deserialize the document: internal class HeartRateDayRecordIdentifierData { public string id { get; set; }

    public string AssemblyVersion { get; set; }

    public string DataItemId { get; set; }

    public string MessageType { get; set; }

    public DateTime TimeStamp { get; set; }

    public string DaySplit { get; set; }

    public SubjectDeviceInformation SubjectDeviceInformation { get; set; }
}

internal class SubjectDeviceInformation
{
    public string SubjectId { get; set; }

    public string DeviceId { get; set; }

    public string StudyId { get; set; }

    public string SiteId { get; set; }
}

Any suggestions on anything wrong i am doing here.

1

1 Answers

1
votes

In both cases you are doing this in an non optimal way.

You only want first or null if there is no match.

However you are doing a synchronous cross partition query call by calling AsEnumerable().FirstOrDefault().

Also your where clause should be an Expression<Func<HeartRateDayRecordIdentifierData, bool>> instead of the Func.

What happens in both cases is that first you return all the data in CosmosDB AND THEN LINQ does the in memory filtering to give you data back.

What you should do instead is you should be using the while(query.HasMoreResults) and query.ExecuteNextAsync() methods to return your data back.

Here is how your query should be:

public async Task<HeartRateDayRecordIdentifierData> GetSomethingAsync()
{
    var query = 
        _documentClient.CreateDocumentQuery<HeartRateDayRecordIdentifierData>(collectionUri).Where(x =>
               x.SubjectDeviceInformation.StudyId == "TestStudy"
               && x.SubjectDeviceInformation.SiteId == "Site_._Street_23"
               && x.SubjectDeviceInformation.SubjectId == "Subject3"
               && x.SubjectDeviceInformation.DeviceId == "Device1"
               && x.DaySplit == "20181112").AsDocumentQuery();

    while(query.HasMoreResults)
    {
        var results = await query.ExecuteNextAsync();
        if(results.Any())
            return results.First();     
    }          

    return null;
}

That way the SDK which do the minimum required amount of calls to match data and won't query across every possible document.

Let me know if you need any further explanation because it's quite tricky and the samples don't really help on this one.

You can also abstract all of this and just use your objects and the .FirstOrDefaultAsync method if you use Cosmonaut. That way your whole code can change to this:

public async Task<HeartRateDayRecordIdentifierData> GetSomethingAsync()
{
    return await cosmosStore.Query().Where(x =>
                   x.SubjectDeviceInformation.StudyId == "TestStudy"
                   && x.SubjectDeviceInformation.SiteId == "Site_._Street_23"
                   && x.SubjectDeviceInformation.SubjectId == "Subject3"
                   && x.SubjectDeviceInformation.DeviceId == "Device1"
                   && x.DaySplit == "20181112").FirstOrDefaultAsync();
}

You can choose on your own what might be the way to go for you. Disclaimer, I am the creator of Cosmonaut.