none
Why "DefaultIfEmpty()" causes the problem of duplicated?! RRS feed

  • Question

  • Hi,

    namespace CSharp
    {
        using System;
        using System.Collections.Generic;
        using System.Linq;
        using System.Text;
        using System.Text.RegularExpressions;
        using System.Threading;

        class Employee
        {
            public int EmployeeId { get; set; }
            public string EmployeeName { get; set; }
            public int? ManagerId { get; set; }
        }

        class Program
        {
            static void Main(string[] args)
            {
                List<Employee> employees = new List<Employee>()
                {
                    new Employee{EmployeeId=1,EmployeeName="E1",ManagerId=null},
                    new Employee{EmployeeId=2,EmployeeName="E1",ManagerId=1},
                    new Employee{EmployeeId=3,EmployeeName="E1",ManagerId=1},
                    new Employee{EmployeeId=4,EmployeeName="E1",ManagerId=2},
                    new Employee{EmployeeId=5,EmployeeName="E1",ManagerId=2},
                    new Employee{EmployeeId=6,EmployeeName="E1",ManagerId=2},
                    new Employee{EmployeeId=7,EmployeeName="E1",ManagerId=2}
                };

                var result = from e in employees
                             join f in employees on e.EmployeeId equals f.ManagerId into temp
                             select new
                             {
                                 EId=e.EmployeeId,
                                 Employees =temp
                             };

                foreach (var item in result)
                {
                    Console.WriteLine(item.EId+ "\nSub Items:");
                    foreach (var sitem in item.Employees)
                    {
                        Console.WriteLine(sitem.EmployeeId + "<===>" + sitem.EmployeeName + "<===>" + sitem.ManagerId);
                    }
                    Console.WriteLine("========");
                }
            }
        }
    }

    If I use——

    var result = from e in employees
    join f in employees on e.EmployeeId equals f.ManagerId into temp
    from t in temp.DefaultIfEmpty()
    select new
    {
    EId=e.EmployeeId,
    Employees =temp
    };

    This will cause the problem of duplicated!Why?

    Saturday, August 18, 2012 8:45 AM

Answers

  •   The reason is what I posted, but I should have given a better explanation. 

     You are creating a cartesian join, that is resulting in a cartesian product. A cartesian join is when you join every row of one table to every row of another table.
    You can also get one by joining every row of a table to every row of itself.

      In this case the data is duplicated from the first query that you put into temp, you just don't see it until you actually select it out from temp.

    select * from employees

    Now your first query creates this in temp, but doesn't select from it...

    Now when you join employees to temp and select from temp you get...

       I hope this gives a better explanation.


    Chris Wigley MCPD


    • Edited by Chris Wigley Friday, August 24, 2012 2:55 AM Re-linked images
    • Proposed as answer by Mike FengModerator Friday, August 24, 2012 6:17 AM
    • Marked as answer by TimoYang Sunday, August 26, 2012 3:31 AM
    Friday, August 24, 2012 2:21 AM
  • Hi Timo,

    I agree with Chris. I will try his opinion in another way.

    And @Chris, if I misunderstood you, please feel free to follow up to let me know, too.

    1. 

                              from e in employees
                              join f in employees on e.EmployeeId equals f.ManagerId into temp

    This will produce a cartesian, it is 4 clo(I just take the EmployeeId and ManagerId for simple.) * 49 row table. Is here clear? And there only 6 rows is fit the conditions. This code will help you to see the 6 rows:

                var tempSet = from e in employees
                              join f in employees on e.EmployeeId equals f.ManagerId into temp
                              select temp;
    
                foreach (var item in tempSet)
                {
                    Console.WriteLine("item count: " + item.Count());
                    foreach (var itemi in item)
                        if (itemi == null)
                            Console.WriteLine("EmployeeId: null");
                        else
                            Console.WriteLine("EmployeeId: " + itemi.EmployeeId);
                }

    2. If without "from t in temp.DefaultIfEmpty()"/"from t in temp", the final result will be from about 6 rows. Here, acturally, DefaultIfEmpty is not the cause about produce duplicate records. And it is the result you expected.

    3. If there is such statement. the final result will be from the cartesian of employees and temp, and this cartesian table has 7*6 = 42 rows. The condition is e = temp.e

    I hope this picture will help you to understand it:

    Best regards,


    Mike Feng
    MSDN Community Support | Feedback to us
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    • Marked as answer by TimoYang Sunday, August 26, 2012 3:31 AM
    Friday, August 24, 2012 7:05 AM
    Moderator

All replies

  • Your first query is selecting directly from employees and your second it selecting from temp resulting in a cartesian. If I understand your intent correctly I believe the query you are looking for is

    var result = from e in employees
    join f in employees on e.EmployeeId equals f.ManagerId into temp
    //from t in temp.DefaultIfEmpty()
    select new{	EId=e.EmployeeId,Employees =temp.DefaultIfEmpty()};


    Chris Wigley MCPD

    • Proposed as answer by Chris Wigley Thursday, August 23, 2012 10:16 PM
    • Unproposed as answer by TimoYang Friday, August 24, 2012 12:57 AM
    Tuesday, August 21, 2012 1:36 PM
  • Yes……I know how to do that……But Why duplicated?Can u give me the reason?

    Friday, August 24, 2012 12:57 AM
  •   The reason is what I posted, but I should have given a better explanation. 

     You are creating a cartesian join, that is resulting in a cartesian product. A cartesian join is when you join every row of one table to every row of another table.
    You can also get one by joining every row of a table to every row of itself.

      In this case the data is duplicated from the first query that you put into temp, you just don't see it until you actually select it out from temp.

    select * from employees

    Now your first query creates this in temp, but doesn't select from it...

    Now when you join employees to temp and select from temp you get...

       I hope this gives a better explanation.


    Chris Wigley MCPD


    • Edited by Chris Wigley Friday, August 24, 2012 2:55 AM Re-linked images
    • Proposed as answer by Mike FengModerator Friday, August 24, 2012 6:17 AM
    • Marked as answer by TimoYang Sunday, August 26, 2012 3:31 AM
    Friday, August 24, 2012 2:21 AM
  • I cannot understand, either……:(
    Friday, August 24, 2012 2:41 AM
  • Hi Timo,

    I agree with Chris. I will try his opinion in another way.

    And @Chris, if I misunderstood you, please feel free to follow up to let me know, too.

    1. 

                              from e in employees
                              join f in employees on e.EmployeeId equals f.ManagerId into temp

    This will produce a cartesian, it is 4 clo(I just take the EmployeeId and ManagerId for simple.) * 49 row table. Is here clear? And there only 6 rows is fit the conditions. This code will help you to see the 6 rows:

                var tempSet = from e in employees
                              join f in employees on e.EmployeeId equals f.ManagerId into temp
                              select temp;
    
                foreach (var item in tempSet)
                {
                    Console.WriteLine("item count: " + item.Count());
                    foreach (var itemi in item)
                        if (itemi == null)
                            Console.WriteLine("EmployeeId: null");
                        else
                            Console.WriteLine("EmployeeId: " + itemi.EmployeeId);
                }

    2. If without "from t in temp.DefaultIfEmpty()"/"from t in temp", the final result will be from about 6 rows. Here, acturally, DefaultIfEmpty is not the cause about produce duplicate records. And it is the result you expected.

    3. If there is such statement. the final result will be from the cartesian of employees and temp, and this cartesian table has 7*6 = 42 rows. The condition is e = temp.e

    I hope this picture will help you to understand it:

    Best regards,


    Mike Feng
    MSDN Community Support | Feedback to us
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    • Marked as answer by TimoYang Sunday, August 26, 2012 3:31 AM
    Friday, August 24, 2012 7:05 AM
    Moderator