In a project i'm currently working on, we have come to realise that we should not use DocumentDb collections as if they are the equivalent of a table in f.ex SQL Server. As a result, we are now persisting all of the entities, belonging to a single tenant in a single collection.
We already have lots of linq queries in our codebase which assume that each document type (aggregate root) is persisted in a dedicated collection. In an attempt to make the transition painless, i set out to refactor our data access object, so that its api continues to reason about aggregate roots, and deal with the single collection vs dedicated collections in it's implementation.
My approach is to wrap an aggregate root in an Resource<T> object, which derives from Resource and exposes a Model property as well as a Type property. I thought i would then be able to expose an IQueryable<T> to consuming code based on the following code:
return _client.CreateDocumentQuery<Resource<TModel>>(_collection.DocumentsLink)
.Where(x => x.Type == typeof(TModel).Name)
.Select(x => x.Model);
Initial testing showed that this worked as planned and i confidently committed my changes. When doing functional testing however, we found that some queried models had all of their properties set to their default values (ie. null, 0, false, etc).
I can reproduce the problem with the following code:
var wrong = _client.CreateDocumentQuery<Resource<TModel>>(_collection.DocumentsLink)
.Where(x => x.Type == typeof(TModel).Name)
.Select(x => x.Model)
.Where(x => !x.IsDeleted)
.ToArray();
var correct = _client.CreateDocumentQuery<Resource<TModel>>(_collection.DocumentsLink)
.Where(x => x.Type == typeof(TModel).Name)
.Where(x => !x.Model.IsDeleted)
.Select(x => x.Model)
.ToArray();
The results of the above queries are not the same!!
- Both queries return the same number of
TModelinstances. - Only the instances returned by the second example have their properties populated.
In order for my refactoring to be successful, i need wrong to be ... right :) Falling back to SQL is not an option as we value type safety of linq. Changing our approach to expose the Resource<T> objects would touch lots of code, as it requires all *.Property references to be substituted by *.Model.Property references.
It seems an issue with the linq provider that is part of the DocumentDb client.
We use Microsoft.Azure.DocumentDb version 1.4.1
Edit 2015-09-24
The generated SQL queries are:
- correct:
{"query":"SELECT VALUE root.Model FROM root WHERE ((root.Type = \"DocumentType\") AND (NOT root.Model.IsDeleted)) "} - wrong:
{"query":"SELECT * FROM root WHERE ((root.Type = \"DocumentType\") AND (NOT root.Model.IsDeleted)) "}
Also, this issue has been reported (and picked up) on GitHub here: https://github.com/Azure/azure-documentdb-net/issues/58
WHEREclause while the first creates a weird subquery. BTW, SQL is type safe. Parameterized SQL queries are also type safe. It's string concatenation that isn't type safe. - Panagiotis KanavostoString()on the LINQ query. - Andrew Liu