none
Linq statement generates nested selects on group by query

    Question

  • Hi

    We are using EF to extract data and would like to extract records grouped by their creation date and perform some aggregate operation on them. However, the actual SQL query generated for our LINQ statement is using mulitple nested selects and projects to fullfill our LINQ statement. We cannot determine if we are using suboptimal LINQ syntax, if we can "tweak" to get a better SQL query generated or if EF cannot handle our specific scenario optimally.

    A simplified examples follows:

    Generate the data schema and fill in some data

    create table personT (
      person_id uniqueidentifier not null, 
      first_name nvarchar(100), 
      last_name nvarchar(100), 
      created datetime not null, 
      constraint pk_person primary key (person_id));
    
    create table orderT (
      order_id uniqueidentifier not null, 
      person_id uniqueidentifier not null, 
      name nvarchar(100), 
      created datetime not null, 
      price int, 
      constraint pk_order primary key (order_id));
    
    
    declare @person_no int;
    set @person_no = 1;
    
    while (@person_no < 1000)
    begin
    
    insert into personT (person_id, first_name, last_name, created)
    values (newid(), substring(convert(nvarchar(38), newid()), 1, 8), substring(convert(nvarchar(38), newid()), 1, 8), getdate());
    set @person_no = @person_no +1;
    end
    
    insert into personT (person_id, first_name, last_name, created) values ('1FC66E37-6EAF-4032-B374-E7B60FBD25EA', 'John', 'Smith', '2009-01-01');
    insert into personT (person_id, first_name, last_name, created) values ('3BE03566-5251-47c9-A0C6-35B6319616FF', 'John', 'Smith2', '2009-01-01');
    
    --the order table
    
    declare @order_no int;
    set @order_no = 1;
    while (@order_no < 20000)
    begin
    
    insert into orderT (order_id, person_id, name, created, price)
    values (newid(), '3BE03566-5251-47c9-A0C6-35B6319616FF',  substring(convert(nvarchar(38), newid()), 1, 8), getdate(), 1);
    set @order_no = @order_no +1;
    
    end
    
    insert into orderT (order_id, person_id, name, created, price) values (newid(), '1FC66E37-6EAF-4032-B374-E7B60FBD25EA', 'Software', getdate(), 1);
    
    Build an Entity Model from the above schema

    We want to group the orders by summarizing the prices per day for a given period and a given order name, so we want output in the form (Sum of order prices, the day of the summation, the name of the order). For this we have the following LINQ query:


    Guid person_id = new Guid("1FC66E37-6EAF-4032-B374-E7B60FBD25EA");
    DateTime fromTime = new DateTime(2008,4,23);
    DateTime toTime = new DateTime(2009,5,23);
    
    var list = from o in _entities.orderT
               where o.personT.person_id == person_id &&
                     o.personT.created >= fromTime &&
                     o.personT.created <= toTime
               group o by new { o.name, o.personT.created.Year, o.personT.created.Month, o.personT.created.Day } into g
               orderby g.Key.name
               select new { g.Key, count = g.Sum(x => x.price) };
    


    We where expecting a SQL query looking very similar, but instead get a query with several projections and nested select queries. For completeness I have pasted it below, feel free to skip it :)

    exec sp_executesql N'SELECT 
    [Project4].[C5] AS [C1], 
    [Project4].[C1] AS [C2], 
    [Project4].[name] AS [name], 
    [Project4].[C2] AS [C3], 
    [Project4].[C3] AS [C4], 
    [Project4].[C4] AS [C5], 
    [Project4].[C6] AS [C6]
    FROM ( SELECT 
    	[Project3].[name] AS [name], 
    	[Project3].[C1] AS [C1], 
    	[Project3].[C2] AS [C2], 
    	[Project3].[C3] AS [C3], 
    	[Project3].[C4] AS [C4], 
    	1 AS [C5], 
    	[Project3].[C5] AS [C6]
    	FROM ( SELECT 
    		[Project2].[name] AS [name], 
    		[Project2].[C1] AS [C1], 
    		[Project2].[C2] AS [C2], 
    		[Project2].[C3] AS [C3], 
    		[Project2].[C4] AS [C4], 
    		(SELECT 
    			SUM([Extent7].[price]) AS [A1]
    			FROM         [dbo].[orderT] AS [Extent7]
    			LEFT OUTER JOIN [dbo].[personT] AS [Extent8] ON [Extent7].[person_id] = [Extent8].[person_id]
    			INNER JOIN [dbo].[personT] AS [Extent9] ON [Extent7].[person_id] = [Extent9].[person_id]
    			LEFT OUTER JOIN [dbo].[personT] AS [Extent10] ON [Extent7].[person_id] = [Extent10].[person_id]
    			LEFT OUTER JOIN [dbo].[personT] AS [Extent11] ON [Extent7].[person_id] = [Extent11].[person_id]
    			LEFT OUTER JOIN [dbo].[personT] AS [Extent12] ON [Extent7].[person_id] = [Extent12].[person_id]
    			LEFT OUTER JOIN [dbo].[personT] AS [Extent13] ON [Extent7].[person_id] = [Extent13].[person_id]
    			LEFT OUTER JOIN [dbo].[personT] AS [Extent14] ON [Extent7].[person_id] = [Extent14].[person_id]
    			LEFT OUTER JOIN [dbo].[personT] AS [Extent15] ON [Extent7].[person_id] = [Extent15].[person_id]
    			WHERE ([Extent7].[person_id] = @p__linq__1) AND ([Extent8].[created] >= @p__linq__2) AND ([Extent9].[created] <= @p__linq__3) AND (([Extent7].[name] = [Project2].[name]) OR (([Extent7].[name] IS NULL) AND ([Project2].[name] IS NULL))) AND (((DATEPART (year, [Extent10].[created])) = [Project2].[C2]) OR ((DATEPART (year, [Extent11].[created]) IS NULL) AND ([Project2].[C2] IS NULL))) AND (((DATEPART (month, [Extent12].[created])) = [Project2].[C3]) OR ((DATEPART (month, [Extent13].[created]) IS NULL) AND ([Project2].[C3] IS NULL))) AND (((DATEPART (day, [Extent14].[created])) = [Project2].[C4]) OR ((DATEPART (day, [Extent15].[created]) IS NULL) AND ([Project2].[C4] IS NULL)))) AS [C5]
    		FROM ( SELECT 
    			[Distinct1].[name] AS [name], 
    			[Distinct1].[C1] AS [C1], 
    			[Distinct1].[C2] AS [C2], 
    			[Distinct1].[C3] AS [C3], 
    			[Distinct1].[C4] AS [C4]
    			FROM ( SELECT DISTINCT 
    				[Extent1].[name] AS [name], 
    				1 AS [C1], 
    				DATEPART (year, [Extent4].[created]) AS [C2], 
    				DATEPART (month, [Extent5].[created]) AS [C3], 
    				DATEPART (day, [Extent6].[created]) AS [C4]
    				FROM      [dbo].[orderT] AS [Extent1]
    				LEFT OUTER JOIN [dbo].[personT] AS [Extent2] ON [Extent1].[person_id] = [Extent2].[person_id]
    				INNER JOIN [dbo].[personT] AS [Extent3] ON [Extent1].[person_id] = [Extent3].[person_id]
    				LEFT OUTER JOIN [dbo].[personT] AS [Extent4] ON [Extent1].[person_id] = [Extent4].[person_id]
    				LEFT OUTER JOIN [dbo].[personT] AS [Extent5] ON [Extent1].[person_id] = [Extent5].[person_id]
    				LEFT OUTER JOIN [dbo].[personT] AS [Extent6] ON [Extent1].[person_id] = [Extent6].[person_id]
    				WHERE ([Extent1].[person_id] = @p__linq__1) AND ([Extent2].[created] >= @p__linq__2) AND ([Extent3].[created] <= @p__linq__3)
    			)  AS [Distinct1]
    		)  AS [Project2]
    	)  AS [Project3]
    )  AS [Project4]
    ORDER BY [Project4].[name] ASC',N'@p__linq__1 uniqueidentifier,@p__linq__2 datetime2(7),@p__linq__3 datetime2(7)',@p__linq__1='1FC66E37-6EAF-4032-B374-E7B60FBD25EA',@p__linq__2='2008-04-23 00:00:00',@p__linq__3='2009-04-23 00:00:00'
    

    So, our question is:

    1) Are we doing something wrong in our LINQ statement?
    2) Can we tweak our LINQ statement or somewhere else to get actual group by operations in the LINQ statement instead of nested subqueries
    3) Can EF be used for our scenario at all or does our problem (or db schema?) present a problem outside the EF domain

    Any input would be much appreciated, as we are on the brink of discarding EF for another technology...
    Monday, April 27, 2009 9:23 AM

All replies

  • A couple of ideas:

    1) You can try dropping the association between OrderT and PersonT. (Although your SQL-DDL don't reflect it I am assuming there is a foreign key constraint (and association) from OrderT.person_id to PersonT.person_id..?). Without the association you can expose PersonID as a scalar (value) property on OrderT. This gives you more flexibility in query composition so you can eliminate some of the (10!) joins to PersonT from the query. E.g.:

    from o in ee.orderT
    join p in ee.personT on o.person_id equals p.person_id 
    where p.person_id == person_id &&
         p.created >= fromTime &&
         p.created <= toTime
    group o by new { o.name, p.created.Year, p.created.Month, p.created.Day } into g
    orderby g.Key.name
    select new { g.Key, count = g.Sum(x => x.price) };
    

    The above example will still generate a monster of a query but with fewer redundant joins so it may be something to play around with.


    2) Consider using Linq-to-SQL for now, and re-evaluate when the next version of EF comes out. The linq-to-sql generated query for your example is:

    SELECT [t3].[name], [t3].[value2] AS [Year], [t3].[value22] AS [Month], [t3].[value3] AS [Day], [t3].[value] AS [count]
    FROM (
        SELECT SUM([t2].[price]) AS [value], [t2].[name], [t2].[value] AS [value2], [t2].[value2] AS [value22], [t2].[value3]
        FROM (
            SELECT [t0].[name], DATEPART(Year, [t1].[created]) AS [value], DATEPART(Month, [t1].[created]) AS [value2], DATEPART(Day, [t1].[created]) AS [value3], [t1].[person_id], [t1].[created], [t0].[price]
            FROM [dbo].[orderT] AS [t0]
            INNER JOIN [dbo].[personT] AS [t1] ON [t1].[person_id] = [t0].[person_id]
            ) AS [t2]
        WHERE ([t2].[person_id] = @p0) AND ([t2].[created] >= @p1) AND ([t2].[created] <= @p2)
        GROUP BY [t2].[name], [t2].[value], [t2].[value2], [t2].[value3]
        ) AS [t3]
    ORDER BY [t3].[name]
    ...which should generate a fairly optimal execution plan... (Although it also has nesting, nested subqueries is not bad if they can be expanded by SQL server...)
    Kristofer - Huagati Systems Co., Ltd. - www.huagati.com - Cool tools for Linq-to-SQL and Entity Framework: www.huagati.com/dbmltools
    Monday, April 27, 2009 3:57 PM
  • ...or...

    3) Use Entity-SQL instead of Linq-to-Entities against your model. The ESQL equivalent of your query is roughly:

    select og.name, og.year, og.month, og.day, sum(og.price) as totalprice
    from (
      select o.name, Year(p.created) as year, Month(p.created) as month, Day(p.created) as day, o.price
      from enemaerkeEntities.orderT as o
      inner join enemaerkeEntities.personT as p on o.personT = p
      where p.person_id = @person_id
       and p.created >= @fromTime
       and p.created <= @toTime
    ) as og
    group by og.name, og.year, og.month, og.day
    order by og.name
    
    

    ...and it generates a T-SQL query that is somewhere between the L2E and L2S generated queries:

    SELECT 
    [Project3].[C5] AS [C1], 
    [Project3].[name] AS [name], 
    [Project3].[C2] AS [C2], 
    [Project3].[C3] AS [C3], 
    [Project3].[C4] AS [C4], 
    [Project3].[C1] AS [C5]
    FROM ( SELECT 
    	[GroupBy1].[A1] AS [C1], 
    	[GroupBy1].[K1] AS [name], 
    	[GroupBy1].[K2] AS [C2], 
    	[GroupBy1].[K3] AS [C3], 
    	[GroupBy1].[K4] AS [C4], 
    	1 AS [C5]
    	FROM ( SELECT 
    		[Project2].[name] AS [K1], 
    		[Project2].[C1] AS [K2], 
    		[Project2].[C2] AS [K3], 
    		[Project2].[C3] AS [K4], 
    		SUM([Project2].[price]) AS [A1]
    		FROM ( SELECT 
    			[Extent1].[name] AS [name], 
    			[Extent1].[price] AS [price], 
    			DATEPART (year, [Extent2].[created]) AS [C1], 
    			DATEPART (month, [Extent2].[created]) AS [C2], 
    			DATEPART (day, [Extent2].[created]) AS [C3]
    			FROM  [dbo].[orderT] AS [Extent1]
    			INNER JOIN [dbo].[personT] AS [Extent2] ON  EXISTS (SELECT 
    				cast(1 as bit) AS [C1]
    				FROM   ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]
    				INNER JOIN [dbo].[personT] AS [Extent3] ON 1 = 1
    				WHERE ([Extent1].[person_id] = [Extent3].[person_id]) AND ([Extent3].[person_id] = [Extent2].[person_id])
    			)
    			WHERE ([Extent2].[person_id] = @person_id) AND ([Extent2].[created] >= @fromTime) AND ([Extent2].[created] <= @toTime)
    		)  AS [Project2]
    		GROUP BY [Project2].[name], [Project2].[C1], [Project2].[C2], [Project2].[C3]
    	)  AS [GroupBy1]
    )  AS [Project3]
    ORDER BY [Project3].[name] ASC
    
    
    


    4) Or just put it in a stored procedure, import and call that stored proc from your EF model:

    create procedure GetPersonOrderSummary(@personid uniqueidentifier, @fromDate datetime, @untilDate datetime) as
    select o.name, year(p.created) as year, month(p.created) as month, day(p.created) as day, sum(o.price) as price
    from persont as p
    inner join ordert as o on o.person_id = p.person_id
    where p.person_id = @personid
      and p.created >= @fromDate
      and p.created <= @untilDate
    group by o.name, year(p.created), month(p.created), day(p.created)
    order by o.name
    
    
    

    Kristofer - Huagati Systems Co., Ltd. - www.huagati.com - Cool tools for Linq-to-SQL and Entity Framework: www.huagati.com/dbmltools
    Tuesday, April 28, 2009 4:34 AM
  • As a FYI, I tested this one and a pile of similar queries with VS2010 beta 2 - it looks a lot better. Although it still generates a couple of levels of redundant sub-query nesting [which has virtually no impact on performance under normal circumstances], the final execution plans are as good as can be... Well done, Kati et. al.
    Kristofer - Huagati Systems Co., Ltd.
    Cool tools for Linq-to-SQL and Entity Framework:
    huagati.com/dbmltools (add-in with new features for Visual Studio 2008's L2S and EF designers)
    huagati.com/L2SProfiler (Query profiler for Linq-to-SQL and LLBLGen Pro)
    Sunday, October 25, 2009 12:01 PM
  • Hi Kristofer,

    I'm really surprised when I edit the SQL generated by Linq to Entities...

    I try to write the simple following query :

    SELECT
    v.IdElementMaitrise,
    SUM(v.IdNiveauValidation),
    MAX(em.libelle01)
    FROM ElementMaitrise AS em
    INNER JOIN Validation AS v ON v.IdElementMaitrise = em.IdElementMaitrise
    GROUP BY v.IdElementMaitrise

    So I started with the code :

                var q = from b in ObjectContext.ElementMaitrise
                        join val in ObjectContext.Validation on b.IdElementMaitrise equals val.IdElementMaitrise
                        group val by new {val.IdElementMaitrise} into g
                        select new ElementMaitriseList
                        {
                            IdElementMaitrise = g.Key.IdElementMaitrise,
                            NiveauMaitrise = (int)(g.Sum(o => o.IdNiveauValidation)),
                        };
     which give me the following SQL Statement :

    SELECT
    [GroupBy1].[K1] AS [IdElementMaitrise],
    [GroupBy1].[A1] AS [C1],
    N'Description of domain' AS [C2]
    FROM ( SELECT
     [Extent2].[IdElementMaitrise] AS [K1],
     SUM([Extent2].[IdNiveauValidation]) AS [A1]
     FROM  [dbo].[ElementMaitrise] AS [Extent1]
     INNER JOIN [dbo].[Validation] AS [Extent2] ON [Extent1].[IdElementMaitrise] = [Extent2].[IdElementMaitrise]
     GROUP BY [Extent2].[IdElementMaitrise]
    )  AS [GroupBy1]

    Ok not really so bad... there's a not necessary select in another select but I can suppose that the optimizer understand what I want...

    So I tried do add the Max function to retrieve a column description of my main table as follow :

                var q = from b in ObjectContext.ElementMaitrise
                        join val in ObjectContext.Validation on b.IdElementMaitrise equals val.IdElementMaitrise
                        group val by new {val.IdElementMaitrise} into g
                        select new ElementMaitriseList
                        {
                            IdElementMaitrise = g.Key.IdElementMaitrise,
                            NiveauMaitrise = (int)(g.Sum(o => o.IdNiveauValidation)),
                            Libelle01Domaine = g.Max(o => o.ElementMaitrise.Domaine.Libelle01)
                        };

     And what I've got look like this :

    SELECT
    [GroupBy1].[K1] AS [IdElementMaitrise],
    [GroupBy1].[A1] AS [C1],
    (SELECT
     MAX([Extent4].[Libelle01]) AS [A1])
     FROM   [dbo].[Validation] AS [Extent2]
     INNER JOIN [dbo].[ElementMaitrise] AS [Extent3] ON [Extent2].[IdElementMaitrise] = [Extent3].[IdElementMaitrise]
     LEFT OUTER JOIN [dbo].[Domaine] AS [Extent4] ON [Extent3].[IdDomaineApp] = [Extent4].[IdDomaineApp]
     WHERE [GroupBy1].[K1] = [Extent2].[IdElementMaitrise]) AS [C2]
    FROM ( SELECT
     [Extent1].[IdElementMaitrise] AS [K1],
     SUM([Extent1].[IdNiveauValidation]) AS [A1]
     FROM [dbo].[Validation] AS [Extent1]
     GROUP BY [Extent1].[IdElementMaitrise]
    )  AS [GroupBy1]

    I really don't understand how the Linq To Entities works...

    Ok, I can make a stored proc to resolve this, but I have some problem to imagine that the Linq to Entities objects translate in that way such as simple query.

    What can I do to workaround ? Is there anayway to execute direct SQL Query in a dataset or datareader and populate ObjectSet<Entity> by code ?

    I know that's not a good way but in my mind, I can't let my application to do such queries...

    Thanks for response.

    Ety.

     

     

    Thursday, December 09, 2010 4:42 PM