none
Select Distinct Rows from Data Table RRS feed

  • Question

  • Hi,

      I have a huge DataTable more than 20,000 rows having around 100 columns. (Believe me, I tried to convince many people not to have this.) I need to have distinct rows of data with distinct values for selected columns, based on user selected columns. I am using DataView.ToTable(true, columnnames) is working good, but had a huge performance hit. It is taking around 60 seconds for some columns.

    Is there any way I can make DataViwe.ToTable(distinct, ColumnNames) using linq query on DataTable, (I want to add my columns dynamically to LINQ query) to have a better performance.

     

    Thanks,

     Krishna.


    --Krishna
    Friday, September 9, 2011 6:43 PM

Answers

  • Hi You can use row filter in data view see the code snippet below

      dvwItemNames.RowFilter() = "SELECT DISTINCT Name"

    for having deep understanding refer to article

    http://www.developerfusion.com/article/4703/filtering-and-sorting-in-adonet/4/

     

    second thing which you can do

    Actually, ADO.NET 2.0 added a(n apparently not well known) feature that allows you to create a new table containing the distinct entries from an existing table.  Here's how it works:

     

    First, create a DataView for your table, and apply any filtering and sorting in that dataview.  If you have no filtering or sorting, you can just use originalTable.DefaultView.

     

    Second, call ToTable() on the dataview.  ToTable() has an overload which lets you specify whether or not to return only distinct values, and a params string[] argument to specify which columns you want in the new table.  If you specify true as the first argument, only rows unique within the columns that you specify in the params argument will be returned.  If you want all of the columns from the original table, you can simply specify the boolean argument "true"; if you don't specify any columns, all columns are returned.

     

    Thus, the easiest way to return a new table that has only the unique values from the original table is as follows:

     

    DataTable distinctTable = originalTable.DefaultView.ToTable( /*distinct*/ true);

     

    3rd step 

    The methods mentioned previously should all work. In addition, if you're using .NET 3.5, you can use LINQ to do this very easily.

     

    For instance:

     

    IEnumerable<DataRow> results = myDataTable.AsEnumerable().Distinct();

     

    If you need your results as a DataTable, you could do:

     

    DataTable results = myDataTable.AsEnumerable().Distinct().CopyToDataTable();

     

    Lastly, if your distinct logic needs to be more complicated than the default implementation, you can easily use your own equality comparer.

     4th step 

    look this one

    http://weblogs.asp.net/eporter/archive/2005/02/10/370548.aspx

    Saturday, September 10, 2011 6:19 AM
  • @rahuliet -- Wow, I guess you didn't really read the question. Your 2nd suggestion was to use the DataView.ToTable(true), which Krishna stated was already being used and had poor performance.

    @Krishna -- I haven't tested this for performance, but this LINQ solution will at least correctly give you DISTINCT values:

    dtLinq = dt.AsEnumerable().Select(row =>
        {
          DataRow newRow = dt.NewRow();
          newRow["code"] = row.Field<string>("code");
          return newRow;
        }).Distinct(DataRowComparer.Default).CopyToDataTable();
    
    

    Now, I know this isn't exactly what you asked about (since the column names need to be user-defined) ... but perhaps you can take the above example and tweak it.

    Or -- look at that last link that @rahuliet posted ... it looks promising.


    ~~Bonnie Berent [C# MVP]

    geek-goddess-bonnie.blogspot.com
    Sunday, September 11, 2011 12:00 AM
  •  Thanks all. Apologies for late reply.

     As my understanding distinct operation on DataView (DataViwe.ToTable(distinct, ColumnNames)  on huge dataviews is taking longer time, compared to distinct on smaller tables. I tweaked my logic to apply distinct on table, with less # of columns and it is performing better than with more # of columns.  I found threads similar to this.  Seems to be an issue with DataView.ToTable() operation.

    Similar Thread

    -- Krishna


    --Krishna
    Tuesday, September 20, 2011 3:15 PM

All replies

  • Hi You can use row filter in data view see the code snippet below

      dvwItemNames.RowFilter() = "SELECT DISTINCT Name"

    for having deep understanding refer to article

    http://www.developerfusion.com/article/4703/filtering-and-sorting-in-adonet/4/

     

    second thing which you can do

    Actually, ADO.NET 2.0 added a(n apparently not well known) feature that allows you to create a new table containing the distinct entries from an existing table.  Here's how it works:

     

    First, create a DataView for your table, and apply any filtering and sorting in that dataview.  If you have no filtering or sorting, you can just use originalTable.DefaultView.

     

    Second, call ToTable() on the dataview.  ToTable() has an overload which lets you specify whether or not to return only distinct values, and a params string[] argument to specify which columns you want in the new table.  If you specify true as the first argument, only rows unique within the columns that you specify in the params argument will be returned.  If you want all of the columns from the original table, you can simply specify the boolean argument "true"; if you don't specify any columns, all columns are returned.

     

    Thus, the easiest way to return a new table that has only the unique values from the original table is as follows:

     

    DataTable distinctTable = originalTable.DefaultView.ToTable( /*distinct*/ true);

     

    3rd step 

    The methods mentioned previously should all work. In addition, if you're using .NET 3.5, you can use LINQ to do this very easily.

     

    For instance:

     

    IEnumerable<DataRow> results = myDataTable.AsEnumerable().Distinct();

     

    If you need your results as a DataTable, you could do:

     

    DataTable results = myDataTable.AsEnumerable().Distinct().CopyToDataTable();

     

    Lastly, if your distinct logic needs to be more complicated than the default implementation, you can easily use your own equality comparer.

     4th step 

    look this one

    http://weblogs.asp.net/eporter/archive/2005/02/10/370548.aspx

    Saturday, September 10, 2011 6:19 AM
  • @rahuliet -- Wow, I guess you didn't really read the question. Your 2nd suggestion was to use the DataView.ToTable(true), which Krishna stated was already being used and had poor performance.

    @Krishna -- I haven't tested this for performance, but this LINQ solution will at least correctly give you DISTINCT values:

    dtLinq = dt.AsEnumerable().Select(row =>
        {
          DataRow newRow = dt.NewRow();
          newRow["code"] = row.Field<string>("code");
          return newRow;
        }).Distinct(DataRowComparer.Default).CopyToDataTable();
    
    

    Now, I know this isn't exactly what you asked about (since the column names need to be user-defined) ... but perhaps you can take the above example and tweak it.

    Or -- look at that last link that @rahuliet posted ... it looks promising.


    ~~Bonnie Berent [C# MVP]

    geek-goddess-bonnie.blogspot.com
    Sunday, September 11, 2011 12:00 AM
  • Hi,

    I am writing to check the status of the issue on your side. Would you mind letting us know the result of the suggestions?

    If you need further assistance, please feel free to let me know. I will be more than happy to be of assistance.

    Have a nice day.


    Alan Chen[MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Monday, September 19, 2011 7:40 AM
    Moderator
  •  Thanks all. Apologies for late reply.

     As my understanding distinct operation on DataView (DataViwe.ToTable(distinct, ColumnNames)  on huge dataviews is taking longer time, compared to distinct on smaller tables. I tweaked my logic to apply distinct on table, with less # of columns and it is performing better than with more # of columns.  I found threads similar to this.  Seems to be an issue with DataView.ToTable() operation.

    Similar Thread

    -- Krishna


    --Krishna
    Tuesday, September 20, 2011 3:15 PM