none
How do i remove duplicate rows in data table using C# Linq RRS feed

  • Question

  • I have below data table and it contains many duplicate's over there.

    I just want to compare the two columns in data table, if any duplicate records found then we should delete that rows and maintain original same in the data table..

    How do i remove or delete using C# Linq (the unique column is MaterialNo and Name)

    sample data below

    MaterialNo     Name      Quantity     Company

    EA1011         Saritan     10000        SAN Pharma

    EA2012         Anacin      20000        Rey Pharma

    AB2020         Merin        20200        Cue Pharma

    NoCode         Savin        20000         SAX Pharma

    NoCode         Savin        20000         SAX Pharma

    NoCode         Metacin     10500         Med Pharma

    NoCode         Metacin      10500        Med Pharma

    Note : Unique column is MateiralNo and Name and data maintain in same data table.

    Wednesday, January 8, 2020 12:30 AM

Answers

  • Hi Gani,

    Thank you for posting here.

    LINQ is a good technique, but it is not suitable for all scenarios. At least on this issue, I think this method is simpler.

            static void Main(string[] args)
            {
                DataTable dt = GetDataTable();
                DoWork(dt);
            }
            static void DoWork(DataTable dataTable)
            {
                Hashtable hTable = new Hashtable();
                ArrayList duplicateList = new ArrayList();
                foreach (DataRow drow in dataTable.Rows)
                {
                    if (hTable.Contains(drow["MaterialNo"] +""+ drow["Name"]))
                        duplicateList.Add(drow);
                    else
                        hTable.Add(drow["MaterialNo"]+""+drow["Name"],String.Empty);
                }
                foreach (DataRow dRow in duplicateList)
                    dataTable.Rows.Remove(dRow);
            }
            static DataTable GetDataTable()
            {
                DataTable dataTable = new DataTable();
                dataTable.Columns.Add("MaterialNo", typeof(String));
                dataTable.Columns.Add("Name", typeof(String));
                dataTable.Columns.Add("Quantity", typeof(int));
                dataTable.Columns.Add("Company", typeof(String));
    
                dataTable.Rows.Add("EA1011", "Saritan", 10000, "SAN Pharma");
                dataTable.Rows.Add("EA2012", "Anacin", 20000, "Rey Pharma");
                dataTable.Rows.Add("AB2020", "Merin", 20200, "Cue Pharma");
                dataTable.Rows.Add("NoCode", "Savin", 20000, "SAX Pharma");
                dataTable.Rows.Add("NoCode", "Savin", 20000, "SAX Pharma");
                dataTable.Rows.Add("NoCode", "Metacin", 10500, "Med Pharma");
                dataTable.Rows.Add("NoCode", "Metacin", 10500, "Med Pharma");
                return dataTable;
            }

    Result:

    If you have to use linq for any reason, fell free and let me know and I will modify the code.

    Hope this could be helpful.

    Best Regards,

    Timon 


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, January 8, 2020 5:56 AM

All replies

  • Here's what I do.

    I create a new table and check the table before the addition.

    That would be something to see... Creating the table minus the duplicates!


    George Frias - AWWshop, Wiki1978

    Wednesday, January 8, 2020 12:41 AM
  • Thanks. is it possible to make it using linq C#...?

    sot the final output will be

    MaterialNo     Name      Quantity     Company

    EA1011         Saritan     10000        SAN Pharma

    EA2012         Anacin      20000        Rey Pharma

    AB2020         Merin        20200        Cue Pharma

    NoCode         Savin        20000         SAX Pharma

    NoCode         Metacin     10500         Med Pharma

    Wednesday, January 8, 2020 1:10 AM
  • Hello,

    This is for SQL-Server.

    The following uses a Builder pattern to allow the user to select a database followed by a table then the columns to work against. Easy to configure e.g. set the connection string to your SQL-Server.

    Results

    How to use

    https://social.technet.microsoft.com/wiki/contents/articles/51628.sql-server-c-find-duplicate-record-with-identity.aspx

    Full source code

    https://github.com/karenpayneoregon/SqlServerFindDuplicateRecords

    Another option is via SQL, here there are six key columns, CompanyName, ContactName, ContactTitle, Address, City and PostalCode. Of course this works with less columns and any table configured correctly which is basically the template for the first method.

    SELECT A.*
    FROM Customers A
    INNER JOIN
        (
        SELECT 
            CompanyName,
            ContactName,
            ContactTitle,
            Address,
            City,
            PostalCode
        FROM 
            Customers
        GROUP BY 
            CompanyName,
            ContactName,
            ContactTitle,
            Address,
            City,
            PostalCode
        HAVING COUNT(*) > 1
        ) B
    ON
    A.CompanyName = B.CompanyName AND
    A.ContactName = B.ContactName AND
    A.ContactTitle = B.ContactTitle AND
    A.Address = B.Address AND
    A.City = B.City AND
    A.PostalCode = B.PostalCode
    ORDER BY 
        A.CompanyName
    Anytime you can perform the work on the database is best rather than the client side operation as major databases do this natively.

     


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    Wednesday, January 8, 2020 1:13 AM
    Moderator
  • Thanks. But, this is not sql server.

    This is C# win form data table which we need to handle inside the C# Code using linq...?

    Wednesday, January 8, 2020 1:17 AM
  • Thanks. But, this is not sql server.

    This is C# win form data table which we need to handle inside the C# Code using linq...?

    The idea is the same, in this sample works off a DataTable, same logic applies as in my first example but this is hard coded to specific columns which you can modify as needed.

    using System.Data;
    using System.Linq;
    
    namespace DuplicateForDataTable
    {
        /// <summary>
        /// Sample how to get duplicate records/datarows in a DataTable
        /// via more than one column and include the primary key so we
        /// can have options to a) shows data to users b) allow user
        /// to delete records as we need a key to do this.
        /// </summary>
        public class DataTableDuplicateFinder
        {
            public DataTable GetDuplicates()
            {
                DataAccess da = new DataAccess();
                DataTable dt = da.AllCustomersDataDataTable;
    
                DataTable dtNew = dt.Clone();
                dtNew.Columns["Identifier"].AutoIncrement = false;
    
                var duplicates = from row in dt.AsEnumerable()
                                 select
                                 new
                                 {
                                     Identifier = row[0],
                                     CompanyName = row[1],
                                     ContactName = row[2],
                                     ContactTitle = row[3],
                                     Address = row[4],
                                     City = row[5],
                                     PostalCode = row[6]
                                 }
                                     into temp
                                     group temp by
                                     new
                                     {
                                         CompanyName = temp.CompanyName,
                                         ContactName = temp.ContactName,
                                         ContactTitle = temp.ContactTitle
                                     }
                                         into grouped
                                         where grouped.Count() > 1
                                         select grouped.Select(g =>
                                         new
                                         {
                                             g.Identifier,
                                             g.CompanyName,
                                             g.ContactName,
                                             g.ContactTitle,
                                             g.Address,
                                             g.City,
                                             g.PostalCode
                                         }
                                     );
    
                foreach (var Item in duplicates)
                {
                    foreach (var row in Item)
                    {
                        dtNew.Rows.Add(new object[]
                            {
                                (int)row.Identifier,
                                row.CompanyName,
                                row.ContactName,
                                row.ContactTitle,
                                row.Address,
                                row.City,
                                row.PostalCode
                            }
                        );
                    }
                }
    
                return dtNew;
            }
        }
    }


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    Wednesday, January 8, 2020 1:47 AM
    Moderator
  • Hi Gani,

    Thank you for posting here.

    LINQ is a good technique, but it is not suitable for all scenarios. At least on this issue, I think this method is simpler.

            static void Main(string[] args)
            {
                DataTable dt = GetDataTable();
                DoWork(dt);
            }
            static void DoWork(DataTable dataTable)
            {
                Hashtable hTable = new Hashtable();
                ArrayList duplicateList = new ArrayList();
                foreach (DataRow drow in dataTable.Rows)
                {
                    if (hTable.Contains(drow["MaterialNo"] +""+ drow["Name"]))
                        duplicateList.Add(drow);
                    else
                        hTable.Add(drow["MaterialNo"]+""+drow["Name"],String.Empty);
                }
                foreach (DataRow dRow in duplicateList)
                    dataTable.Rows.Remove(dRow);
            }
            static DataTable GetDataTable()
            {
                DataTable dataTable = new DataTable();
                dataTable.Columns.Add("MaterialNo", typeof(String));
                dataTable.Columns.Add("Name", typeof(String));
                dataTable.Columns.Add("Quantity", typeof(int));
                dataTable.Columns.Add("Company", typeof(String));
    
                dataTable.Rows.Add("EA1011", "Saritan", 10000, "SAN Pharma");
                dataTable.Rows.Add("EA2012", "Anacin", 20000, "Rey Pharma");
                dataTable.Rows.Add("AB2020", "Merin", 20200, "Cue Pharma");
                dataTable.Rows.Add("NoCode", "Savin", 20000, "SAX Pharma");
                dataTable.Rows.Add("NoCode", "Savin", 20000, "SAX Pharma");
                dataTable.Rows.Add("NoCode", "Metacin", 10500, "Med Pharma");
                dataTable.Rows.Add("NoCode", "Metacin", 10500, "Med Pharma");
                return dataTable;
            }

    Result:

    If you have to use linq for any reason, fell free and let me know and I will modify the code.

    Hope this could be helpful.

    Best Regards,

    Timon 


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Wednesday, January 8, 2020 5:56 AM
  • Hi Gani,

    Has your issue been resolved?

    If so, please click "Mark as answer" to the appropriate answer, so that it will help other members to find the solution quickly if they face a similar issue.

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, January 14, 2020 8:37 AM
  • And if you have vigor like creating a regex, use a lambda expression.

    My understanding of cross-platform (latest/greatest being Wasm) says lead with C Sharp.

    Good fortune!


    George Frias - AWWshop, Wiki1978

    Tuesday, January 14, 2020 8:49 AM
  • Thanks for your solution..it's perfect...
    Monday, January 20, 2020 4:43 AM