none
DefaultIfEmpty on join? (Basically an outer join) RRS feed

  • Question

  • I have a query:

                            var orders = from o in orderHistory
                                         join c in categoryDictionary on o.Sku equals c.Key
                                         group o by o.Sku into skuList
                                         select new
                                         {
                                             Sku = skuList.Key,
                                             Summary = from h in skuList
                                                       group h by h.Date.Year * 100 + ((h.Date.Day > 15) ? h.Date.Month * 2 : h.Date.Month * 2 - 1) into skuHistory
                                                       orderby skuHistory.Key
                                                       select new { Period = skuHistory.Key, Quantity = skuHistory.Sum(s => s.Quantity) }
                                         };
    

    Which works OK but instead of the join producing a category string If the part number (SKU) is not in the categoryDictionary I would like the query to return something like DefaultIfEmpty("Unknown"). I have not hit on the right syntax yet that will allow me to compile this query.

    Any suggestions?

    Thank you.

    Kevin
    Tuesday, May 19, 2009 1:12 PM

Answers

  • You need to join after grouping:


      from o in orderHistory
      group o by o.Sku into skuList
      join c in categoryDictionary on o.Key equals c.Key into categories
      select new
      {
         PartNumber = categories.Any() ? categories.First().PartNumber : "Unknown",
         Summary = from h in skuList ...
      }


    Alternatively (although this will be less efficient if you're querying a local collection rather than a database):


      from o in orderHistory
      group o by o.Sku into skuList
      let category = categoryDictionary.FirstOrDefault (c => c.Key == skuList.Key)
      select new
      {
        PartNumber = category == null ? "Unknown" : category.PartNumber,
        Summary = ...
      }
     

    Joe

    Write LINQ queries interactively - www.linqpad.net
    Tuesday, May 19, 2009 1:36 PM
    Answerer
  • OK - in that case, we have to go back to joining and then grouping. Only we'll need to do a GroupJoin instead of an ordinary join:


     var orders = from o in orderHistory
       join c in categoryDictionary on o.Sku equals c.Key into cats
       group o by cats.Any() ? cats.First().Key : "Unknown" into skuList
       select new
       {
         Sku = skuList.Key,
         Summary = from h in skuList
           group h by h.Date.Year * 100 + ((h.Date.Day > 15) ? h.Date.Month * 2 : h.Date.Month * 2 - 1) into skuHistory
           orderby skuHistory.Key
           select new { Period = skuHistory.Key, Quantity = skuHistory.Sum(s => s.Quantity)
       }


    Joe

    Write LINQ queries interactively - www.linqpad.net
    • Marked as answer by KevinBurton Wednesday, May 20, 2009 2:31 PM
    Wednesday, May 20, 2009 2:30 PM
    Answerer

All replies

  • You need to join after grouping:


      from o in orderHistory
      group o by o.Sku into skuList
      join c in categoryDictionary on o.Key equals c.Key into categories
      select new
      {
         PartNumber = categories.Any() ? categories.First().PartNumber : "Unknown",
         Summary = from h in skuList ...
      }


    Alternatively (although this will be less efficient if you're querying a local collection rather than a database):


      from o in orderHistory
      group o by o.Sku into skuList
      let category = categoryDictionary.FirstOrDefault (c => c.Key == skuList.Key)
      select new
      {
        PartNumber = category == null ? "Unknown" : category.PartNumber,
        Summary = ...
      }
     

    Joe

    Write LINQ queries interactively - www.linqpad.net
    Tuesday, May 19, 2009 1:36 PM
    Answerer
  • I'm a bit blinded by detail here and not really sure of the semantics of your query. I think I know what you're trying to achieve.

    from o in orderHistory
    join c in categoryDictionary on o.Sku equals c.Key into skuList
    from h in skuList.DefaultIfEmpty()
    select new {
    	Sku = skuList.Key,
    	Summary = h	// filter, group, project with sub-query as required, e.g. Summary = from item in h ...
    }
    Tuesday, May 19, 2009 1:44 PM
    Answerer
  • Sorry about the complexity. Basically the categoryDictionary contains a list of strings (categories) for each part number (SKU). So the declaration of the categoryDictionary looks like Dictionary<int, List<string>> so one SKU can be a "member" of multiple categories. The second grouping is to group all of the SKUs that have the same category into "sales periods". Here I have choosent to have a sales period of twice a month. You have choosen to remove the grouping before the sub-query. Is that not necessary? My query with the default would look like:

                            var orders = from o in orderHistory
                                         join c in categoryDictionary on o.Sku equals c.Key into categoryList
                                         from c in categoryList.DefaultIfEmpty()
                                         group o by o.Sku into skuList
                                         select new
                                         {
                                             Sku = skuList.Key,
                                             Summary = from h in skuList
                                                       group h by h.Date.Year * 100 + ((h.Date.Day > 15) ? h.Date.Month * 2 : h.Date.Month * 2 - 1) into skuHistory
                                                       orderby skuHistory.Key
                                                       select new { Period = skuHistory.Key, Quantity = skuHistory.Sum(s => s.Quantity) }
                                         };
    

    THank you for your suggestions.

    Kevin

    Tuesday, May 19, 2009 2:14 PM
  • It is probably a typo on my part but this fails to complie:

                            var orders = from o in orderHistory
                                         join c in categoryDictionary on o.Sku equals c.Key into skuList
                                         from h in skuList.DefaultIfEmpty()
                                         select new
                                         {
                                             Sku = skuList.Key,
                                             Summary = from h in skuList
                                                       group h by h.Date.Year * 100 + ((h.Date.Day > 15) ? h.Date.Month * 2 : h.Date.Month * 2 - 1) into skuHistory
                                                       orderby skuHistory.Key
                                                       select new { Period = skuHistory.Key, Quantity = skuHistory.Sum(s => s.Quantity) }
                                         };
    


    I get a few errors but the first is on the skuList.Key

    	'System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<int,System.Collections.Generic.List<Utilities.TestHelperFunctions.CategoryDetail>>>' does not contain a definition for 'Key' and no extension method 'Key' accepting a first argument of type 'System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<int,System.Collections.Generic.List<Utilities.TestHelperFunctions.CategoryDetail>>>' could be found (are you missing a using directive or an assembly reference?)
    Kevin
    Tuesday, May 19, 2009 2:23 PM
  • Based on what you've just said Joe has a good approach but I could only get it to compile with the following tweaks. Note I've simplified the grouping by date for my own debugging.

    from o in orderHistory
    group o by o.Sku into skuList
    join c in categoryDictionary on skuList.Key equals c.Key into categories
    select new
    {
      PartNumber = categories.Any() ? categories.First().Key.ToString() : "Unknown",
      Summary = from h in skuList
                group h by h.Date.Year into skuHistory
                orderby skuHistory.Key
                select new { Period = skuHistory.Key, Quantity = skuHistory.Sum(s => s.Quantity) }
    };
    Tuesday, May 19, 2009 3:20 PM
    Answerer
  • On second though this isn't exactly what I wanted. I probably didn't explain it well enough.

    The orderHistotry collection is a query that gives me a list of orders by date and quantity of parts (SKUs). The categoryDictionary has mapping from sku to category name. So what I am trying to get is a summary of sales by category. Say for example the order history indicates that I sold 2 trucks on Jan. 22nd, 10 dolls on March 10th,  6 lamps on May 5th, and 8 glasses on May 18th. Further lets say that trucks and dolls are in the category "Toys" and there isn't a category for lamps or glasses. So elimintaing the sub query I would expect the output to have four rows:

    Toys        Jan. 22   2
    Toys        Mar. 10 10
    Unknown  May   5   6
    Unknown  May 18   8

    The sub query just groups the sales into "sales periods" I have choosen here to use a sales-period of bi-monthly so the ultimate output would be

    Toys        2           2
    Toys        5         10
    Unknown  9           6
    Unknown 10          8

    Thanks again.

    Kevin
    Tuesday, May 19, 2009 3:21 PM
  • Thank you for your help. I think I am almost there. I am not sure the reason but all of the "unknowns" don't seem to be grouped together. Thus it makes me wonder if the rest of the categories are also not properly grouped. Here is the code that I am using:

                            var orders = from o in orderHistory
                                         group o by o.Sku into skuList
                                         join c in categoryDictionary on skuList.Key equals c.Key into categories
                                         select new
                                         {
                                             Category = categories.Any() ? categories.First().Value[0].Category : "Unknown",
                                             Summary = from h in skuList
                                                       group h by h.Date.Year * 100 + ((h.Date.Day > 15) ? h.Date.Month * 2 : h.Date.Month * 2 - 1) into skuHistory
                                                       orderby skuHistory.Key
                                                       select new { Period = skuHistory.Key, Quantity = skuHistory.Sum(s => s.Quantity) }
                                         };
                            writer.WriteLine("Category,Period,Quantity");
                            foreach (var order in orders)
                            {
                                foreach(var skuSummary in order.Summary)
                                {
                                    writer.WriteLine(string.Format("{0},{1},{2}", order.Category, skuSummary.Period, skuSummary.Quantity));
                                }
                            }
    



    Thanks again.

    Kevin

    Tuesday, May 19, 2009 5:19 PM
  • The reason for the "unknowns" not being together is that orderHistory.Sku most likely has a range of non-null values that are not present in categories.

    You can verify this by changing the 6th line in the query as follows:

      Category = categories.Any() ? categories.First().Value[0].Category : "Unknown - key=" + c.Key ,

    If so, do you want to group all these categories together?



    Write LINQ queries interactively - www.linqpad.net
    Wednesday, May 20, 2009 5:13 AM
    Answerer
  • Thank you for your interest.

    Yes, I would like the unknowns grouped together. I thought that the join would group all of the "unknowns" together but it seems I was mistaken.

     

    Kevin

    Wednesday, May 20, 2009 1:59 PM
  • OK - in that case, we have to go back to joining and then grouping. Only we'll need to do a GroupJoin instead of an ordinary join:


     var orders = from o in orderHistory
       join c in categoryDictionary on o.Sku equals c.Key into cats
       group o by cats.Any() ? cats.First().Key : "Unknown" into skuList
       select new
       {
         Sku = skuList.Key,
         Summary = from h in skuList
           group h by h.Date.Year * 100 + ((h.Date.Day > 15) ? h.Date.Month * 2 : h.Date.Month * 2 - 1) into skuHistory
           orderby skuHistory.Key
           select new { Period = skuHistory.Key, Quantity = skuHistory.Sum(s => s.Quantity)
       }


    Joe

    Write LINQ queries interactively - www.linqpad.net
    • Marked as answer by KevinBurton Wednesday, May 20, 2009 2:31 PM
    Wednesday, May 20, 2009 2:30 PM
    Answerer