none
Performance of Group By

    Question

  • I am accessing data view a view of about 20,000 rows.  I simply dragged the view onto the designer to create by DataContexct.  When I run a simply query, it runs instantaneous.  I can sort and filter with change in performance.  I am printing the data onto a web page using a foreach loop.

    When I use a GroupBy clause it takes several seconds.  In the Query Analyzer it is instantaneous.  When I first convert my data to a List using ToList() and then to the GroupBy on the List the performance is good.  My code is below.  This sounds counter intuitive.  I do not want to have pull into memory 20,000 rows to do a group by.

    Help!!

    Jay


    ============

    This is very slow:

    AnalyticsDataContext db = new AnalyticsDataContext();

    var query =

    from H in db.HeadCounts

    group H by H.Age;

    foreach (var group in query) {Response.Write(group.Key + "<br/>");}


    This is very fast - instantaneous:

    AnalyticsDataContext db = new AnalyticsDataContext();

    var MyList = db.HeadCounts.ToList();

    var query = MyList.GroupBy(G => G.Age);

    foreach (var group in query) {Response.Write(group.Key + "<br/>");}



    Monday, February 23, 2009 5:39 PM

All replies

  • I messed up the wording of my original post so here is another try.

    I am accessing data via a view of about 20,000 rows.  I simply dragged the view onto the designer to create my DataContexct.  When I run a simple query, it runs instantaneously.  I can sort and filter with no change in performance.  I am displaying the data onto a web page using a foreach loop.

    When I use a GroupBy clause it takes several seconds.  In the Query Analyzer it is instantaneous.  When I first convert my linq query to a List using ToList(), and then do the GroupBy on the List, the performance is good. 

    My code is below.  I do not want to have pull into memory 20,000 rows to do a group by operation.

    Help!!

    Jay


    ============

    This is very slow:

    AnalyticsDataContext db = new AnalyticsDataContext();

    var query =

    from H in db.HeadCounts

    group H by H.Age;

    foreach (var group in query) {Response.Write(group.Key + "<br/>");}


    This is very fast - instantaneous:

    AnalyticsDataContext db = new AnalyticsDataContext();

    var MyList = db.HeadCounts.ToList();

    var query = MyList.GroupBy(G => G.Age);

    foreach (var group in query) {Response.Write(group.Key + "<br/>");}
    Monday, February 23, 2009 6:18 PM
  • Here is some more information.

    When I look at the SQL Profiler it does the following:

    exec sp_reset_connection

    SELECT [t0].[Age] AS [Key]
    FROM [dbo].[Head] AS [t0]
    GROUP BY [t0].[Age]

    exec sp_executesql N'SELECT [t0].[EMPLID], [t0].[EMPLName], [t0].[Age]
    FROM [dbo].[Head] AS [t0]
    WHERE ((@x1 IS NULL) AND ([t0].[Age] IS NULL)) OR ((@x1 IS NOT NULL) AND ([t0].[Age] IS NOT NULL) AND (@x1 = [t0].[Age]))', N'@x1 int', @x1 = 23

    exec sp_reset_connection

    exec sp_executesql N'SELECT [t0].[EMPLID], [t0].[EMPLName], [t0].[Age]
    FROM [dbo].[Head] AS [t0]
    WHERE ((@x1 IS NULL) AND ([t0].[Age] IS NULL)) OR ((@x1 IS NOT NULL) AND ([t0].[Age] IS NOT NULL) AND (@x1 = [t0].[Age]))', N'@x1 int', @x1 = 46

    Etc.

    -----

    It looks like it isn't even using the SQL Group By clause.  It is using several where classes for each value of Age.

    HELP!!

    Jay

    Monday, February 23, 2009 7:17 PM
  • Some more information:

    If I put an index on Age it speeds it up.  That still does explain why the need for those many extra SQL statements that show up in the Profiler using exec sp_executesql and why all the reconnections.  What is really happening with network traffic for this simple group by example.

    Help!!

    Jay

    Monday, February 23, 2009 7:32 PM
  • A LINQ group-by is not the same thing as a SQL group-by.   A SQL group-by only produces the grouping keys as the result.  A LINQ group-by is a heirarchical query that returns a sequence of groups, where each group contains the key and all the elements (rows) that made up the group.  It appears that L2S is using the N+1 query approach to retrieve the heirarchy; which gives you a separate query for each group.

    When you run the slow query you are retrieving all 20,000 rows, just in groups and via many actual SQL queries.  When you do the group-by on a local list, it doesn't have to return all 20,000 rows from the database since you already have them.

    You can turn a LINQ group-by into a SQL group-by by selecting out only the keys; the heirarchy will automatically be dropped.

    var query =
       from H in db.HeadCounts
       group H by H.Age into g
       select g.Key;

    foreach (var key in query) { Response.Write(key + "<br/>"); }


    Wayward LINQ Lacky
    Sunday, March 1, 2009 6:05 PM
    Moderator