1
votes

In a project i'm currently working on, we have come to realise that we should not use DocumentDb collections as if they are the equivalent of a table in f.ex SQL Server. As a result, we are now persisting all of the entities, belonging to a single tenant in a single collection.

We already have lots of linq queries in our codebase which assume that each document type (aggregate root) is persisted in a dedicated collection. In an attempt to make the transition painless, i set out to refactor our data access object, so that its api continues to reason about aggregate roots, and deal with the single collection vs dedicated collections in it's implementation.

My approach is to wrap an aggregate root in an Resource<T> object, which derives from Resource and exposes a Model property as well as a Type property. I thought i would then be able to expose an IQueryable<T> to consuming code based on the following code:

        return _client.CreateDocumentQuery<Resource<TModel>>(_collection.DocumentsLink)
            .Where(x => x.Type == typeof(TModel).Name)
            .Select(x => x.Model);

Initial testing showed that this worked as planned and i confidently committed my changes. When doing functional testing however, we found that some queried models had all of their properties set to their default values (ie. null, 0, false, etc).

I can reproduce the problem with the following code:

        var wrong = _client.CreateDocumentQuery<Resource<TModel>>(_collection.DocumentsLink)
            .Where(x => x.Type == typeof(TModel).Name)
            .Select(x => x.Model)
            .Where(x => !x.IsDeleted)
            .ToArray();

        var correct = _client.CreateDocumentQuery<Resource<TModel>>(_collection.DocumentsLink)
            .Where(x => x.Type == typeof(TModel).Name)
            .Where(x => !x.Model.IsDeleted)
            .Select(x => x.Model)
            .ToArray();

The results of the above queries are not the same!!

  • Both queries return the same number of TModel instances.
  • Only the instances returned by the second example have their properties populated.

In order for my refactoring to be successful, i need wrong to be ... right :) Falling back to SQL is not an option as we value type safety of linq. Changing our approach to expose the Resource<T> objects would touch lots of code, as it requires all *.Property references to be substituted by *.Model.Property references.

It seems an issue with the linq provider that is part of the DocumentDb client.

We use Microsoft.Azure.DocumentDb version 1.4.1

Edit 2015-09-24

The generated SQL queries are:

  • correct: {"query":"SELECT VALUE root.Model FROM root WHERE ((root.Type = \"DocumentType\") AND (NOT root.Model.IsDeleted)) "}
  • wrong: {"query":"SELECT * FROM root WHERE ((root.Type = \"DocumentType\") AND (NOT root.Model.IsDeleted)) "}

Also, this issue has been reported (and picked up) on GitHub here: https://github.com/Azure/azure-documentdb-net/issues/58

1
What are the SQL queries generated in both cases? I'd bet the second creates a simple WHERE clause while the first creates a weird subquery. BTW, SQL is type safe. Parameterized SQL queries are also type safe. It's string concatenation that isn't type safe. - Panagiotis Kanavos
I do not know what the generated SQL looks like. Tbh i wouldn't know how to determine that. Let me look into that (later today). What do you mean with your second statement? - Geoffrey Braaf
Worst case, you can use Fiddler to capture the HTTP requests sent to the database - Panagiotis Kanavos
This has proven near impossible ... Fiddler does not show any traffic for queries. The code does not allow me to inspect generated SQL afaik. - Geoffrey Braaf
You can also inspect the generated SQL by calling toString()on the LINQ query. - Andrew Liu

1 Answers

0
votes

This has been confirmed as a problem with the SDK. a fix has been checked in and will ship with the next SDK drop.

in the interim you can use SQL, or change where you place the WHERE clauses.