Optimizing queries in Entity Framework Core / LINQ for group by / count()

Question

I have been trying out different ways to write queries that include "group by" and "count()" using Entity Framework Core and LINQ. But the SQL that is generated simply returns all records and then the "group by" and "count()" is done in memory.

Currently I am trying to write a query that is equivalent to the following SQL:

select 
    e.EngagementId,
    e.Name,
    c.ClientName,
    count(distinct ed.EngagementDocumentId) as DocumentCount,
    count(distinct es.EngagementSurveyId) as SurveyCount
from Engagement e
inner join Client c on c.ClientId = e.ClientId
left outer join EngagementDocument ed on ed.EngagementId = e.EngagementId
left outer join EngagementSurvey es on es.EngagementId = e.EngagementId
group by e.EngagementId, e.Name, c.ClientName

My question is: How can I write a query that generates similar SQL?

See How to get COUNT DISTINCT in translated SQL with EF Core — Ivan Stoev
Thanks! I will have a look. But my problem isn't particularly with "count distinct". Generally all queries using "group by" seem to be handled in memory. I just used the example above because if I can solve that then I will be able to solve simpler queries using "group by" also. — Thomas Aniss Ajjouri
If after GroupBy you add Select with only key / aggregates, the query should be translated to SQL. — Ivan Stoev

Gordon Linoff Gordon Linoff · Accepted Answer · 2020-05-07T19:20:40

I would recommend correlated subqueries or (equivalently) lateral joins:

select e.EngagementId, e.Name, c.ClientName,
       (select count(*)
        from EngagementDocument ed 
        where ed.EngagementId = e.EngagementId,
       ) as DocumentCount,
       (select count(*)
        from EngagementSurvey es 
        where es.EngagementId = e.EngagementId,
       ) as SurveyCount
from Engagement e inner join
     Client c 
     on c.ClientId = e.ClientId;

In addition, your want indexes on EngagementDocument(EngagementId) and EngagementSurvey(EngagementId).

Your version is creating a Cartesian product of documents and surveys for each client/engagement -- and then using additional resources to aggregate and remove duplicates. The above can just do the counts by scanning the indexes.

Optimizing queries in Entity Framework Core / LINQ for group by / count()

1 Answers