none
c# fastest way to iterate through List or DataTable to validate each row RRS feed

  • Question

  • Hi All,

    I have been working on a project for which I require to iterate through very big DataTable containing 1.5M records and validate each record for some fields but it is taking too much time which is affecting performance also while the program running the CPU is spiked upto 100% usage. 

    Appreciate if anyone can help. 

    Note: I have tried "Parallel.ForEach"


    • Edited by BeingJaved Tuesday, September 17, 2019 3:29 AM
    Tuesday, September 17, 2019 3:16 AM

Answers

  • Hi BeingJaved,

    Thank you for posting here.

    According to your description, I suggest that you could put your list or datatable into database and you could use Datareader to validate each row.

    Because the DataReader is a good choice when you're retrieving large amounts of data because the data is not cached in memory.

    You could refer to the following link to know more about it.

    Retrieve data using a DataReader

    Best Regards,

    Jack


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    • Marked as answer by BeingJaved Wednesday, September 18, 2019 6:02 AM
    Tuesday, September 17, 2019 5:47 AM
    Moderator

All replies

  • IMO, you should be using the List of custom objects, like a DTO.

     https://dzone.com/articles/reasons-move-datatables

      https://www.codingblocks.net/programming/boxing-and-unboxing-7-deadly-sins/ 

    http://lauteikkehn.blogspot.com/2012/03/datatable-vs-list.html?m=1 

    https://www.codeproject.com/Articles/1050468/Data-Transfer-Object-Design-Pattern-in-Csharp#IntroductionandDefinition


    Use a for-loop to loop on the list
    • Edited by DA924x Tuesday, September 17, 2019 4:52 AM
    Tuesday, September 17, 2019 4:33 AM
  • Hi BeingJaved,

    Thank you for posting here.

    According to your description, I suggest that you could put your list or datatable into database and you could use Datareader to validate each row.

    Because the DataReader is a good choice when you're retrieving large amounts of data because the data is not cached in memory.

    You could refer to the following link to know more about it.

    Retrieve data using a DataReader

    Best Regards,

    Jack


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    • Marked as answer by BeingJaved Wednesday, September 18, 2019 6:02 AM
    Tuesday, September 17, 2019 5:47 AM
    Moderator
  • Please give more details about what your code does, because just iterating thru few million of data rows in data table is not usually slow. There is several reasons that this might happen. Maybe there is not enough memory or maybe your validation makes some time consuming to each row in data table. Have you measured anyway why or where time goes?

    Tuesday, September 17, 2019 8:15 AM
  • If you have field validation means, Using Any statement to Select that field with your condition, If your data's are correct means here you receive 'false'. if you receive 'true' means select that particular data rows and modify your own condition. don't worry about correct data's :)

    Tuesday, September 17, 2019 8:41 AM
  • Hello,

    With that many records performance will be poor.

    Option 1

    Instead create either a permanent table or temp table and do a bulk insert

    Example to bulk insert to a temp table where localTempTable is the DataTable with 1.5M records

    using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
    {
    bulkCopy.DestinationTableName = "##GiveNameHere" ;
    bulkCopy.WriteToServer(localTempTable);

    Once done use SQL to validate records.

    Option 2

    Stick with the current method and run as a night time process which would be triggered by Windows event.

    Side note 1: In the big picture when rows of data exceed physical capacity organizations will process data after hours. 

    Side note 2: DataTable containers are considered heavy when dealing with processing. There are always alternatives that can be used be it after hours or during office hours like reading data row by row without using DataTable containers but instead a data reader (but remember using temp tables is best), for instance the data comes from a file, read each line one by one.

    Side note 3: In some cases use asynchronous methods which will increase processing time but should not peak CPU processing.

    Side note 4: Run on a server that has little or no other processes running. 


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    Tuesday, September 17, 2019 10:11 AM
    Moderator
  • Is the data coming from a database?  If so, then the right answer is to write an SQL query to let the server do the validation.

    Tim Roberts | Driver MVP Emeritus | Providenza & Boekelheide, Inc.

    Tuesday, September 17, 2019 5:15 PM
  • Hi Jack,

    Thanks for the reply. The solution works like a charm. :)

    Wednesday, September 18, 2019 6:02 AM