10
votes

I am currently using EFCore 1.1 (preview release) with SQL Server.

I am doing what I thought was a simple OUTER JOIN between an Order and OrderItem table.

      var orders = from order in ctx.Order
                   join orderItem in ctx.OrderItem

                   on order.OrderId equals orderItem.OrderId into tmp

                   from oi in tmp.DefaultIfEmpty()

                   select new
                   {
                       order.OrderDt,
                       Sku = (oi == null) ? null : oi.Sku,
                       Qty = (oi == null) ? (int?) null : oi.Qty
                   };

The actual data returned is correct (I know earlier versions had issues with OUTER JOINS not working at all). However the SQL is horrible and includes every column in Order and OrderItem which is problematic considering one of them is a large XML Blob.

SELECT [order].[OrderId], [order].[OrderStatusTypeId], [order].[OrderSummary], [order].[OrderTotal], [order].[OrderTypeId], [order].[ParentFSPId], [order].[ParentOrderId], [order].[PayPalECToken], [order].[PaymentFailureTypeId] ....

...[orderItem].[OrderId], [orderItem].[OrderItemType], [orderItem].[Qty], [orderItem].[SKU] FROM [Order] AS [order] LEFT JOIN [OrderItem] AS [orderItem] ON [order].[OrderId] = [orderItem].[OrderId] ORDER BY [order].[OrderId]

(There are many more columns not shown here.)

On the other hand - if I make it an INNER JOIN then the SQL is as expected with only the columns in my select clause:

SELECT [order].[OrderDt], [orderItem].[SKU], [orderItem].[Qty] FROM [Order] AS [order] INNER JOIN [OrderItem] AS [orderItem] ON [order].[OrderId] = [orderItem].[OrderId]

I tried reverting to EFCore 1.01, but got some horrible nuget package errors and gave up with that.

Not clear whether this is an actual regression issue or an incomplete feature in EFCore. But couldn't find any further information about this elsewhere.


Edit: EFCore 2.1 has addressed a lot of issues with grouping and also N+1 type issues where a separate query is made for every child entity. Very impressed with the performance in fact.

3/14/18 - 2.1 Preview 1 of EFCore isn't recommended because the GROUP BY SQL has some issues when using OrderBy() but it's fixed in nightly builds and Preview 2.

2
was hoping this would be fixed for 1.1 final version but unfortunately it wasnt :-( - Simon_Weaver
@IvanStoev really wish they had made it clearer it was essentially unfinished software. They call it 1.1 but it's really 0.9. Looking forward to 1.2 (or 1.0 as I like to call it) :-) - Simon_Weaver
Update: Seen some promising notes on github about progress with GroupBy for the next version :-) Including a promise that it fixes many issues like this :-) - Simon_Weaver
I fully agree with you. I've never seen MS even beta releasing something like this before. Looks like open source is opening new (bad) practices. But I'm sure EF Core developers know that. Most likely management is pushing (because of the ASP.NET Core stack) and of course marketing will never allow calling it "unfinished software". Yes, there are a few words in the Compare EF Core & EF6.x section of the documentation, in particular Which One Is Right for You... - Ivan Stoev
...section, but that's all, and does not provide enough detail for all problems and bugs one encounters once starts using EF Core. In my experience, currently 95+% of the posts in the entity-framework-core SO tag are caused by a bugs in the software. - Ivan Stoev

2 Answers

9
votes

The following applies to EF Core 1.1.0 (release).

Although shouldn't be doing such things, tried several alternative syntax queries (using navigation property instead of manual join, joining subqueries containing anonymous type projection, using let / intermediate Select, using Concat / Union to emulate left join, alternative left join syntax etc.) The result - either the same as in the post, and/or executing more than one query, and/or invalid SQL queries, and/or strange runtime exceptions like IndexOutOfRange, InvalidArgument etc.

What I can say based on tests is that most likely the problem is related to bug(s) (regression, incomplete implementation - does it really matter) in GroupJoin translation. For instance, #7003: Wrong SQL generated for query with group join on a subquery that is not present in the final projection or #6647 - Left Join (GroupJoin) always materializes elements resulting in unnecessary data pulling etc.

Until it get fixed (when?), as a (far from perfect) workaround I could suggest using the alternative left outer join syntax (from a in A from b in B.Where(b = b.Key == a.Key).DefaultIfEmpty()):

var orders = from o in ctx.Order
             from oi in ctx.OrderItem.Where(oi => oi.OrderId == o.OrderId).DefaultIfEmpty()
             select new
             {
                 OrderDt = o.OrderDt,
                 Sku = oi.Sku,
                 Qty = (int?)oi.Qty
             };

which produces the following SQL:

SELECT [o].[OrderDt], [t1].[Sku], [t1].[Qty]
FROM [Order] AS [o]
CROSS APPLY (
    SELECT [t0].*
    FROM (
        SELECT NULL AS [empty]
    ) AS [empty0]
    LEFT JOIN (
        SELECT [oi0].*
        FROM [OrderItem] AS [oi0]
        WHERE [oi0].[OrderId] = [o].[OrderId]
    ) AS [t0] ON 1 = 1
) AS [t1]

As you can see, the projection is ok, but instead of LEFT JOIN it uses strange CROSS APPLY which might introduce another performance issue.

Also note that you have to use casts for value types and nothing for strings when accessing the right joined table as shown above. If you use null checks as in the original query, you'll get ArgumentNullException at runtime (yet another bug).

-2
votes

Using "into" will create a temporary identifier to store the results.

Reference : MDSN: into (C# Reference)

So removing the "into tmp from oi in tmp.DefaultIfEmpty()" will result in the clean sql with the three columns.

var orders = from order in ctx.Order
               join orderItem in ctx.OrderItem
               on order.OrderId equals orderItem.OrderId
               select new
               {
                   order.OrderDt,
                   Sku = (oi == null) ? null : oi.Sku,
                   Qty = (oi == null) ? (int?) null : oi.Qty
               };