none
Need help converting SQL statement with group by and sum into LINQ to Entities RRS feed

  • Question

  • Hi Guys,

    I have the following SQL statement that I am trying to convert into LINQ to Entities, but the correct solution is alluding me at the moment:

     

    select op.orders_products_id,
    op.products_id,
    op.products_model,
    op.products_name,
    op.products_price,
    sum(op.products_quantity) as quantity,
    sum(op.count_quantity) as count_quantity
    from orders_products op
    where orders_id = 574214
    group by op.products_id
    order by op.products_model, op.orders_products_id
    In this particular SQL statement, the orders_products_id is the primary key for the table, so when we group by the products_id value, it will collapse any duplicate lines in the table for the same product_id, and the quantity column result will be the total quantity for all the columns. Given the ordering statement, the op.orders_products_id should end up being the first line ID for each particular product that has multiple entries in the same table.

     

    To select the lines using LINQ to Entities without doing the grouping, I would use the following code:

     

    using (var uow = new StoreEntities()) {
     var query = from op in uow.OrderProducts
        where op.OrderID == 574214
        orderby op.Model, op.OrderProductID
        select new {
         ID = op.OrderProductID,
         ProductID = op.ProductID,
         Model = op.Model,
         Name = op.Name,
         SalePrice = op.SalePrice,
         Qty = op.Qty,
         CountedQty = op.CountedQty,
        };
     var list = query.ToList();
    }
    
    But I am completely lost at the moment as to how I would go about doing the grouping to get the same result as my original SQL? When I do grouping in LINQ to Entities, the result of the grouping is a new sequence with a key and the details of the items for that key. How do I flatten that out, so that I can get the results out similar to how I wrote the original SQL query?

     


    Wednesday, June 22, 2011 5:13 PM

Answers

  • One problem I ran into is that (at least with MySQL) if you have too many g.FirstOrDefault().whatever's in the select statement, it will end up producing SQL with a ridiculous number of sub-selects in it. In fact it was so bad with MySQL that the DB server barfed on the result and said it contained too many nested select statements. It seems that the number of sub-selects generated ends up growing exponentially with the number of g.FirstOrDefault().whatever's you use!

    The solution to the problem that I came up with is to do the initial grouping query and to ONLY select out the row ID for the first row in the set that we want to return the rest of the data for. And then querying that result set and joining it to the rest of the query. The end result is a nice, elegant SQL query that does exactly what I needed:

    // Build the grouping query first
    var groupQuery =
      from op in UOW.Cached.OrderProducts
      where op.OrderID == orderID
      orderby op.Model, op.OrderProductID
      group op by op.ProductID into g
      select new {
        OrderProductID = g.FirstOrDefault().OrderProductID,
        Qty = g.Sum(col => col.Qty),
        CountedQty = g.Sum(col => col.CountedQty),
      };
    
    // Build the main selection query based on the grouping query above driving joins to the rest of the data
    query =
      from g in groupQuery
      join op in UOW.Cached.OrderProducts on g.OrderProductID equals op.OrderProductID
      join p in UOW.Cached.Products on op.ProductID equals p.ProductID into ps
      from p in ps.DefaultIfEmpty()
      join pl in UOW.Cached.Locations on p.LocationID equals pl.LocationID into pls
      from pl in pls.DefaultIfEmpty()
      orderby op.Model
      select new DTO.OrderLine {
        ID = op.OrderProductID,
        ProductID = op.ProductID,
        Model = op.Model,
        Name = op.Name,
        Barcode = p.Barcode,
        SalePrice = op.SalePrice,
        FinalPrice = op.FinalPrice,
        CostPrice = op.CostPrice,
        RefundablePrice = op.RefundablePrice,
        TaxRate = op.TaxRate,
        Qty = g.Qty,
        CountedQty = g.CountedQty,
        StockOnHand = op.QtyOnHand,
        BackOrderDate = op.BackOrderDate,
        Location = pl.Barcode ?? "",
        ShippingBoxID = op.ShippingBoxID,
        LastModified = op.LastModified,
      };
    
    


    Tuesday, June 28, 2011 5:03 PM

All replies

  • Hi,

    Normally a group by statement should contain all elements in the select, or an error is issued. But, in MySql (which I suspect you are using due to earlier posts) it's possible to skip this. BUT, you should be warned that this will give you incorrect data.

    Eg, a table with the following columns and data

    Col1    Col2    Amount
    a         b        10
    a         c        20

    and a query like this

    select col1, col2, sum(amount) as amount from Table group by col1
    

    Will give you this answer:

    Col1    Col2    Amount
    a         b        30

    As you can see, where is the c?

    So, since it's normally limited in the SQL and GROUP BY, it's also like this in EF. To do what you want you need to group by each of the columns.

     

    Hope this helps!

     


    --Rune
    Wednesday, June 22, 2011 5:33 PM
  • Can you post how I might do this in LINQ to Entities?
    Wednesday, June 22, 2011 5:53 PM
  • I can :)

    var query = from op in uow.OrderProducts
        where op.OrderID == 574214
        orderby op.Model, op.OrderProductID
        group i by new { 
         op.OrderProductID, 
         op.ProductID, 
         op.Model, 
         op.Name,
         op.SalePrice } into g
        select new {
         ID = g.Key.OrderProductID,
         ProductID = g.Key.ProductID,
         Model = g.Key.Model,
         Name = g.Key.Name,
         SalePrice = g.Key.SalePrice,
         Qty = g.Sum(col => col.Qty),
         CountedQty = g.Sum(col => col.CountedQty)
        };
    
    

    Something like that should be it!

     


    --Rune
    Wednesday, June 22, 2011 7:46 PM
  • Oh, I see. So you put everything that is supposed to be the same within multiple lines into the group by, and then it will actually group by all of those items in the SQL and you can select them out. Nice.

    But I see one problem. Including op.OrderProductID in the grouping will mean that none of the lines will collapse, because those will always be unique for each entry in the database (it is the primary key for the table). But I need to select that out, so seeing how you did this above, I think this would work:

    var query = from op in uow.OrderProducts
      where op.OrderID == 574214
      orderby op.Model, op.OrderProductID
      group i by new { 
       op.ProductID, 
       op.Model, 
       op.Name,
       op.SalePrice } into g
      select new {
       ID = g.FirstOrDefault().OrderProductID,
       ProductID = g.Key.ProductID,
       Model = g.Key.Model,
       Name = g.Key.Name,
       SalePrice = g.Key.SalePrice,
       Qty = g.Sum(col => col.Qty),
       CountedQty = g.Sum(col => col.CountedQty)
      };
    
    
    I will have to give that a try, once I finally find a working MySQL connector that can handle grouping.

    Wednesday, June 22, 2011 8:21 PM
  • One problem I ran into is that (at least with MySQL) if you have too many g.FirstOrDefault().whatever's in the select statement, it will end up producing SQL with a ridiculous number of sub-selects in it. In fact it was so bad with MySQL that the DB server barfed on the result and said it contained too many nested select statements. It seems that the number of sub-selects generated ends up growing exponentially with the number of g.FirstOrDefault().whatever's you use!

    The solution to the problem that I came up with is to do the initial grouping query and to ONLY select out the row ID for the first row in the set that we want to return the rest of the data for. And then querying that result set and joining it to the rest of the query. The end result is a nice, elegant SQL query that does exactly what I needed:

    // Build the grouping query first
    var groupQuery =
      from op in UOW.Cached.OrderProducts
      where op.OrderID == orderID
      orderby op.Model, op.OrderProductID
      group op by op.ProductID into g
      select new {
        OrderProductID = g.FirstOrDefault().OrderProductID,
        Qty = g.Sum(col => col.Qty),
        CountedQty = g.Sum(col => col.CountedQty),
      };
    
    // Build the main selection query based on the grouping query above driving joins to the rest of the data
    query =
      from g in groupQuery
      join op in UOW.Cached.OrderProducts on g.OrderProductID equals op.OrderProductID
      join p in UOW.Cached.Products on op.ProductID equals p.ProductID into ps
      from p in ps.DefaultIfEmpty()
      join pl in UOW.Cached.Locations on p.LocationID equals pl.LocationID into pls
      from pl in pls.DefaultIfEmpty()
      orderby op.Model
      select new DTO.OrderLine {
        ID = op.OrderProductID,
        ProductID = op.ProductID,
        Model = op.Model,
        Name = op.Name,
        Barcode = p.Barcode,
        SalePrice = op.SalePrice,
        FinalPrice = op.FinalPrice,
        CostPrice = op.CostPrice,
        RefundablePrice = op.RefundablePrice,
        TaxRate = op.TaxRate,
        Qty = g.Qty,
        CountedQty = g.CountedQty,
        StockOnHand = op.QtyOnHand,
        BackOrderDate = op.BackOrderDate,
        Location = pl.Barcode ?? "",
        ShippingBoxID = op.ShippingBoxID,
        LastModified = op.LastModified,
      };
    
    


    Tuesday, June 28, 2011 5:03 PM