none
Performance on Distinct DataTable RRS feed

  • Question

  • i would like to ask whether the code below would take long time to process with 100k of data in a datatable?

    DataTable distinctDataTable = sqlDataTable.DefaultView.ToTable(true);

    i've tried this code with 100k of data, it seems that it stopped by at the line of code above and won't proceed to the next line when i tried to step into code. initially i thought it would have difficulty and longer time on the adapter.fill(). but somehow it works fine, the problem right now is at the code where i want to get distinct value from a selection. and it wouldn't work as it should be and stuck forever. is there any alternative to replace the code above for better performance on 100k of data.



    • Edited by Jim6123 Monday, September 12, 2011 7:45 AM
    Monday, September 12, 2011 7:07 AM

All replies

  • Hello,

    Thank you for posting.

    From your description, it's hard to me to provide more helpful suggestions to you. Generally, I suggest you can provide more code snippets or demo application to us. v-xugong@microsoft.com

    Plus, I suggest you use visual studio performance tool to measure and target performance-related issues in your code. http://msdn.microsoft.com/en-US/library/z9z62c29(v=VS.100).aspx

    Please feel free to let us know if you have any finding.

    Thanks,


    Larcolais Gong[MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Wednesday, September 14, 2011 2:22 AM
  • DataTable distinctDataTable = sqlDataTable.DefaultView.ToTable(true);

    from the code above, is it true that i can make the all the rows on all columns in the table to be distinct? right?

    but compared to a query statement using DataAdapter that performs the same operation, the code above takes a very long time to be executed whereas using query statement "Select DISCTINCT column1, column2... from Table" is way faster.

    any reason for the low performance using the code rather than query statement?

    Wednesday, September 14, 2011 3:13 AM
  • Hello again,
    Honesty speaking, I didn't reproduce your question on my side. Normally, if you specify true as parameter into ToTable(), only rows unique within the columns that you specify in the params will be returned. This was my simple test on my side.
                DataTable dt = new DataTable();
                DataSet1 ds = new DataSet1();
                dt = ds.Tables[0].DefaultView.ToTable(true);
    

    It looks that your question was related to the large data. But I also suggest you can send the necessary information to us to analyze your question such as sql script, data, demo application or others.
    Thanks,

    Larcolais Gong[MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    • Proposed as answer by Werewolfs Monday, September 19, 2011 11:18 AM
    Thursday, September 15, 2011 7:26 AM
  • Hi,

    oh ya, i said that i do faced problem with large data instead of small data. so, can i say the code you mentioned is not recommended on large data? i did a comparison on the code you mentioned and query statement->("Select column From Table"). Somehow the query statement did execellent job by getting the result way faster than the code. that's why i posted this question to verify that am i the only one facing this problem, or the code itself could not support large data?

     

    Thursday, September 15, 2011 11:48 AM
  • That's because the query is being executed on the database server, which is optimized for that sort of thing. I don't know why you'd want to bring down 100K+ rows to an in-memory DataSet anyway. Use database queries that bring down smaller result sets.
    ~~Bonnie Berent [C# MVP]

    geek-goddess-bonnie.blogspot.com
    Thursday, September 15, 2011 2:13 PM
  • When you're using the ToTable(true),      1) It first need to Store the resultant as a DataTable, 2) Getting the DataView from the respective DataTable in-memory object and makes a reference for that. 3) And finally the third operation is to get the distinct records and to convert them from DataView to DataTable(given that we know the burdon of casting)..!

    When it comes to querying DISTINCT,     The only operation here is to qury the raw data source to give you the required set of records. In the mean while you'll copy only the needed records into the memory at a time..!


    - <BooM>
    • Proposed as answer by Werewolfs Monday, September 19, 2011 11:18 AM
    Monday, September 19, 2011 6:53 AM
  • ok, now i get it. thanks for the info.
    Monday, September 19, 2011 10:04 AM
  • then besides than using sql query statement, what better method i can use to do the same function from what i've mentioned?
    Tuesday, September 20, 2011 12:19 AM
  • If you're wanting all the columns to be distinct, then no, there is no better way to do this with the DataTable and therefore you're better off having the database return the DISTINCT rows for you with the database query.

    If you didn't want all the columns (for example, you only wanted to have distinct name and account number), then you could use LINQ, which would be faster than the .ToTable(true, --with list of columns--). Up to a point anyway ... LINQ slows down a lot as you add more columns and at some point it ends up being a lot slower. Keep in mind that in either case, you would only be returning the columns that you want to be distinct into the new DataTable and not the rest of the columns from the original DataTable ... I'm guessing that this is NOT what you're after.

    Can I ask why you want to return ALL rows into a DataTable and then only use the DISTINCT rows? Or do you need to be able to utilize two different views of your data ... one containing all rows and one containing distinct rows? If that's the case, you're still better off doing two different queries to the database and filling two different DataTables. It will still be much quicker.


    ~~Bonnie Berent [C# MVP]

    geek-goddess-bonnie.blogspot.com
    Tuesday, September 20, 2011 4:41 AM
  • well, i have a program that will filter out distinct data for all columns from a database when it performs the search to get required data. therefore, i need a method that can help perform better to retrieve in shortest time instead of using database query, then i came across with the method .ToTable() where it can perform distinct operation, too bad it shows slow performance. that's why i need a better method that i can integrated into my program where i used dataset to deal with the rest of my operation such as insert and select.

    now, i get another data adapter to perform the distinct operation to get required result. hopefully there is a new method that can replace this conventional method in future to ease programming.

    Tuesday, September 20, 2011 5:51 AM
  • Sometimes workarounds are the best you can do ... unfortunately.
    ~~Bonnie Berent [C# MVP]

    geek-goddess-bonnie.blogspot.com
    Tuesday, September 20, 2011 6:10 PM