none
Finding out duplicates in the list using LINQ

    Question

  • Hi All,

    Im having a list of customers.

    for ex:-

    Id               Name                Email                     Address

    1                  A                     a@a.com                 abc

    2                  B                     b@b.com                 abc

    3                  C                     c@c.com                 abc

    4                  D                     d@d.com                abc

    1                  A                     a@a.com                abc

    What im supposed to do here is separate the unique items and duplicate items using LINQ, i:e;

    i need to prepare 2 lists of customers as shown below,

    Duplicate list:-

    Id               Name                Email                     Address

    1                  A                     a@a.com                 abc

    1                  A                     a@a.com                 abc

    Unique List:-

    Id               Name                Email                     Address

    2                  B                     b@b.com                 abc

    3                  C                     c@c.com                 abc

    4                  D                     d@d.com                abc

    How to achieve this using LINQ?

    Please help,

    Thanks in Advance.

    Monday, September 27, 2010 12:52 PM

All replies

  • You could achieve this by using the Distinct method in combination with the except method

    http://msdn.microsoft.com/en-us/library/system.linq.enumerable.distinct.aspx

    http://msdn.microsoft.com/en-us/library/bb300779.aspx

    However there is a performance issue...

    As an alternative you create list A and B(this is where the elements are), you order it by the repeat field and iterate it, if next value is not the same we move record to A, at the end you get non repeats in list A and repeats in list B, off the top of my head that is the cleaner thing I can think of, perhaps there is something better.

    Regards

    Monday, September 27, 2010 1:27 PM
  • The except method is not working properly, can anyone please demonstrate how to achieve my goal as mentioned in my query earlier.
    Tuesday, September 28, 2010 4:48 AM
  • What do you mean, not working properly?

    The idea was to use Distinct first, that will give you a list of non-repeat, then you use except with that against the original list, that will tell you what are repeats and what are not.

    The other thing that comes to mind that could satisfy what you want is a lookup table...

    http://msdn.microsoft.com/en-us/library/bb292716.aspx, use an appropiate key and the only thing left is to iterate the collection for keys that have more than one values (those are repeats, the others are not)

    Regards

     

     

    Tuesday, September 28, 2010 12:50 PM
  • I implemented IEquatable interface in my Customer class and the Except method is now working properly. But i still have some issues around it.

    Initially i have a list of 5 customers as mentioned in my above post

    Id               Name                Email                     Address

    1                  A                     a@a.com                 abc

    2                  B                     b@b.com                 abc

    3                  C                     c@c.com                 abc

    4                  D                     d@d.com                abc

    1                  A                     a@a.com                abc

    and among these 5 customers i extract the 3 unique customers using group by clause

    Unique List:-

    Id               Name                Email                     Address

    2                  B                     b@b.com                 abc

    3                  C                     c@c.com                 abc

    4                  D                     d@d.com                abc

    but when i apply the except functionality on my base customer list except unique list it returns me only one customer instead of 2 duplicate customers. How to solve this?

    expected: -

    Duplicate list:-

    Id               Name                Email                     Address

    1                  A                     a@a.com                 abc

    1                  A                     a@a.com                 abc

    got the result as:-

    Duplicate list:-

    Id               Name                Email                     Address

    1                  A                     a@a.com                 abc

    Please help, thanks in Advance

    Tuesday, September 28, 2010 12:53 PM
  • That is odd, you should get the two result, a small test performed resulted in an output of two 2 not one, how did you implemented the IQuatable interfase?

    using System;
    using System.Linq;
      public class SampleClass
      {
        public static void Main ( )
        {
           int[] test1= {2,3,2};
           int[] test2 ={3};
           var result=test1.Except(test2);  
           foreach(int x in result) System.Console.WriteLine(x.ToString());
           System.Console.Read ( );
        }
      }

    Regards

    Tuesday, September 28, 2010 4:06 PM
  • Hi Serquey,

    I did run the sample code written by you but that also returns only one 2 in the result. Did u check that??

    Wednesday, September 29, 2010 9:48 AM
  • http://ideone.com/KQ3Wz

    I try to test things before posting code so yes I did, refer to link above, that is an online compiler and that is the code sample and it does give me two 2 so...

    Regards

     

    Wednesday, September 29, 2010 12:01 PM
  • I have implemented the Iequatable on the customer class like this..

     

    #region IEquatable<Customer> Members
    
        public bool Equals(Customer other)
        {
          // Check whether the compared object is null.
          if (Object.ReferenceEquals(other, null)) return false;
    
          // Check whether the compared object references the same data.
          if (Object.ReferenceEquals(this, other)) return true;
    
          // Check whether the objects’ properties are equal.
          return Name.Equals(other.Name) &&
              Address.Equals(other.Address) &&
              Email.Equals(other.Email) &&
              Id.Equals(other.Id);
        }
    
        #endregion
    
        // If Equals returns true for a pair of objects,
        // GetHashCode must return the same value for these objects.
        public override int GetHashCode()
        {
          // Get the hash code for the Name field if it is not null.
          int hashName = Name == null ? 0 : Name.GetHashCode();
    
          // Get the hash code for the Id field.
          int hashId = Id.GetHashCode();
    
          int hashAddress = Address == null ? 0 : Address.GetHashCode();
    
          int hashEmail = Email == null ? 0 : Email.GetHashCode();
    
          // Calculate the hash code for the object.
          return hashId ^ hashName ^ hashAddress ^ hashEmail;
    
        }

     

     

     

    The excpet method of the Linq library doesnt return me multiple number of identical items instead returns only one. can anyone help??

     
    Friday, October 01, 2010 6:18 AM
  • Can you post the query, that uses Except and the code you are using to display the results.

    Thanks

    Friday, October 01, 2010 3:26 PM
  • Here is the code below..

    List<Customer> _cusList = new List<Customer>();
    
    _cusList.Add(new Customer { Name = "A", Email = "a@a.com", Address = "asda", Id = 1 });
    _cusList.Add(new Customer { Name = "B", Email = "b@b.com", Address = "asda", Id = 2 });
    _cusList.Add(new Customer { Name = "C", Email = "c@c.com", Address = "asda", Id = 3 });
    _cusList.Add(new Customer { Name = "D", Email = "d@d.com", Address = "asda", Id = 4 });
    _cusList.Add(new Customer { Name = "A", Email = "a@a.com", Address = "asda", Id = 1 });
     
    List<Customer> _uniqueCust = new List<Customer>();
    List<Customer> _duplicateCust = new List<Customer>();
    
    var _unique = from c in _cusList
                 group c.Name by
                 new { c.Name, c.Id, c.Email, c.Address } into g
                 where g.Count() == 1
                 select g.Key;
    
    //converting into customer list
    _uniqueCust = _unique.Select(c => new Customer() { Id = c.Id, Name = c.Name, Address = c.Address, Email = c.Email }).ToList();
    
    _duplicateCust = _cusList.Except(_uniqueCust).ToList();
    Monday, October 04, 2010 10:41 AM
  • Hi,

    What about to use GroupBy?

    To get duplicates, you can perform something like this:

    int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 };
    var duplicates = from item in listOfItems 
        group item by item into g 
        let count = g.Count() 
        select new {Value = g.Key, Count = count}; 
    
    

    To remove duplicates, you can use Contains as in T-SQL you were using "not in":

    var uniques = from item in listOfItems
          where !duplicates.Contains(item.Id) 
          select item; 
    

    Hope this can help you,

    JAReyes.


    Please remember to Vote & "Mark As Answer" if this post is helpful to you.
    Por favor, recuerda Votar y "Marcar como respuesta" si la solucion de esta pregunta te ha sido útil.
    Monday, October 04, 2010 12:42 PM
  • The above code doesn't seems to be working.. sorry to say.
    Tuesday, October 05, 2010 4:32 AM
  • Hi,

    Sorry, I miss the where condition. Duplicates sample would be like this:

    	int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 };
    	var duplicates = from item in listOfItems 
    		group item by item into g 
    		let count = g.Count() 
    		where g.Count() > 1
    		select new {Value = g.Key, Count = count}; 
    

    This will give you the each duplicated key and the number of items. If you want compare with more than one field, you must add them to the GroupBy condition.

    Then, you must change parameters in order to match your own lists types.

    Hope this can help you,

    JAReyes.


    Please remember to Vote & "Mark As Answer" if this post is helpful to you.
    Por favor, recuerda Votar y "Marcar como respuesta" si la solucion de esta pregunta te ha sido útil.
    Tuesday, October 05, 2010 8:32 AM
  • Hi JAReyes,

    I understood whatever you have told. but my point is not that. If you see my post before your reply i have written the code which im using to get the unique items from the list and then using the Except method of LINQ to get all the duplicate items in the List.

    The problem here is, the except method returns me the duplicate items only once for ex:- in your list there are duplicate items of 4 and 3 then the except method will return me 4 and 3 only once instead of twice.

    Wednesday, October 06, 2010 4:55 AM
  • Hi again,

    This query in LinqPad returns me 4 and 3 twice:

    void Main()
    {
    		int[] listOfItems = new[] { 4, 2, 3, 1, 6, 4, 3 };
    		var duplicates = from item1 in listOfItems 
    			group item1 by item1 into g
    			where g.Count() > 1
    			select g; 
    		foreach(var d in duplicates)
    			Console.WriteLine(d);
    }
    

    Best regards,

    JAReyes.


    Please remember to Vote & "Mark As Answer" if this post is helpful to you.
    Por favor, recuerda Votar y "Marcar como respuesta" si la solucion de esta pregunta te ha sido útil.
    Wednesday, October 06, 2010 7:59 AM
  • This is what im getting when i run the above code in a console application..

    System.Linq.Lookup`2+Grouping[System.Int32,System.Int32]
    System.Linq.Lookup`2+Grouping[System.Int32,System.Int32]

    Wednesday, October 06, 2010 8:52 AM
  • Hi,

    This code is just a sample to suggest you how your problem can be solved, and this sample is working ok in LinqPad.

    Please, try to adapt this sample code to your own scenario and classes.

    Best regards,

    JAReyes.

    Wednesday, October 06, 2010 11:09 AM
  • I think you have not understood what i had written before ur post.. i have tried all these things..
    Thursday, October 07, 2010 4:26 AM
  • Is this a Linq-to-SQL or a Linq-to-Objects scenario?

    Assuming this is a linq-to-sql query (since it is in the linq-to-sql forum), and that the IDs of the duplicated records are different I'd go with a self-joining query, e.g.:

    var duplicates =
      from cust in dc.Customers
      join c2 in dc.Customers on new { cust.Name, cust.Email, cust.Address } equals new { c2.Name, c2.Email, c2.Address }
      where cust.ID != c2.ID
      select cust;

    var nonDuplicates =
      from cust in dc.Customers
      from c2 in (
        from c2 in dc.Customers
        where c2.Name == cust.Name
          && c2.Email == cust.Email
          && c2.Address == cust.Address
          && c2.ID != cust.ID
        select c2
      ).DefaultIfEmpty()
      where c2 == null
      select cust;


     
       Cool tools for Linq-to-SQL and Entity Framework 4:
     huagati.com/dbmltools - Rule based class and property naming, Compare and Sync model <=> DB, Sync SSDL <=> CSDL (EF4)
     huagati.com/L2SProfiler - Query profiler for Linq-to-SQL and Entity Framework v4
    Thursday, October 07, 2010 6:05 AM
    Answerer
  • Hi kristofer,

    The above code also doesnt solve my purpose.. :(

    Thursday, October 07, 2010 8:40 AM
  • Hi kristofer,

    The above code also doesnt solve my purpose.. :(


    In that case I think you need to try to explain once again in more detail what you're trying to achieve, because we're now three people who have tried but failed to help you solve the problem....
     
       Cool tools for Linq-to-SQL and Entity Framework 4:
     huagati.com/dbmltools - Rule based class and property naming, Compare and Sync model <=> DB, Sync SSDL <=> CSDL (EF4)
     huagati.com/L2SProfiler - Query profiler for Linq-to-SQL and Entity Framework v4
    Friday, October 08, 2010 7:07 AM
    Answerer
  • Yes, please do, ... also could something be done about the spam, is annoying.

    Regards

    Friday, October 08, 2010 1:00 PM
  • hi

     

    //////////////////here is xaml:

     

     

     

     <Grid Name="Grid1">
    
            <StackPanel Orientation="Vertical" HorizontalAlignment="Center" VerticalAlignment="Center">
    
                <DataGrid Name="dgdupl" AutoGenerateColumns="True"/>
    
                <DataGrid Name="dgnodupl" AutoGenerateColumns="True"/>
    
            </StackPanel>
    
      </Grid>	

     

    //////////////here is code:

     

     

     

    private void Window_Loaded(object sender, RoutedEventArgs e)
    
            {
    
                List<cust> lcust = new List<cust>();
    
                lcust.Add(new cust() { Id = "1", Name = "A", Email = "a@a.com", Addr = "abc" });
    
                lcust.Add(new cust() { Id = "2", Name = "B", Email = "b@b.com", Addr = "abc" });
    
                lcust.Add(new cust() { Id = "3", Name = "C", Email = "c@c.com", Addr = "abc" });
    
                lcust.Add(new cust() { Id = "4", Name = "D", Email = "d@d.com", Addr = "abc" });
    
                lcust.Add(new cust() { Id = "1", Name = "A", Email = "a@a.com", Addr = "abc" });
    
    
    
                var dupl = from m in lcust
    
                           group m.Id by
    
              new { m.Id, m.Name, m.Email, m.Addr } into g
    
                           where g.Count() > 1
    
                           select g.Key;
    
    
    
                dgdupl.ItemsSource = dupl.ToList();
    
    
    
                var nodupl = from m in lcust
    
                             group m.Id by
    
              new { m.Id, m.Name, m.Email, m.Addr } into g
    
                             where g.Count() == 1
    
                             select g.Key;
    
    
    
                dgnodupl.ItemsSource = nodupl.ToList();
    
    
    
            }
    
    
    
            private class cust
    
            {
    
                public string Id
    
                {
    
                    get { return id; }
    
                    set { id = value; }
    
                }
    
                public string Name
    
                {
    
                    get { return name; }
    
                    set { name = value; }
    
                }
    
                public string Email
    
                {
    
                    get { return email; }
    
                    set { email = value; }
    
                }
    
                public string Addr
    
                {
    
                    get { return addr; }
    
                    set { addr = value; }
    
                }
    
                private string id;
    
                private string name;
    
                private string email;
    
                private string addr;
    
    
    
            }
    

     

     


    Please remember to mark the replies as answers if they help and unmark them if they provide no help. Regards, Alireza
    Wednesday, October 13, 2010 7:45 AM