none
DataReader Vs DataTable

    Question

  • i dont know .net very much. Recently when i talked to my friend he put forward a scnerio he encountered in his office.
    he has to deal with 1 million records. first he used datatable which is filled with that 1 million records and he loops through the records
    to check some business logic. Datatable with million records costs memory also looping the datatable consumes more time. Then to sort this out he switches to datareader. Now program works fine.my doubts are
    is dataerader connection oriented? is it fetching only one record at a time from database? if so for million records it must do million roundtrips, isn't it optimal?
    alternatively, couldnt i take first 10000(watever the feasible number) records in memory and process it then take another 10000 and so on..would that be bad logic?
    Tuesday, June 19, 2007 8:22 AM

All replies

  • Logically, with a good design of the application, you would apply business logic against the data in a batch SQL statement(s) to avoid retrieval of the data completely or to retrieve only the data that match specific condition in a WHERE clause, not all the rows. In some cases it is not possible, in some it is. It depends on what you need to achieve. In a case of the DataReader, yes, it keeps connection opened and provides fast read-only, forward-only cursor and I believe it caches some data unless you specify SingleRow option when you call ExecuteReader method to create it. But you do not have much of the control over this cursor.

    Tuesday, June 19, 2007 10:44 AM
  • Hi,

     

    The data reader is connection orientated, the connection remains open for the duration of the reader and is closed only when the Close() method is called on it. Yes I believe it does only fetch one record at a time from the database.

     

    Basically the 1 million records takes up a certain about of memory, that cannot be helped, in order to process the records from the database to the application then that amount of data needs to be transferred. The difference between the two approaches is with the datatable the whole chuck is moved across the network in one large hit which slows things down as it takes time to transfer the data, so the application experiences lag. With the data reader option overall the same amount of information is moved however it is move across the network only one row at a time. Very small data chucks so it travels across faster and it appears things are faster. Same amount of data but transferred in smaller faster chunks.

     

    It does put a little bit of strain on the database but generally databases have been optimised to cope the best they can with this.

     

    Also there is less memory involved, when you load data into a datatable there is some additional processing (indexing) to help speed up access on the table. This again takes a bit of time and uses a lot of memory. Data reader is the better choice.

     

    Yes you idea of processing the data in batches is pretty much the happy medium. It wouldn't be bad logic but it depends on the project and what you intended to do with the data. For example you might want to save the data and work in a location where the database wasn't accessible and then update the database once you came back.

     

    Hope that was helpful mate.

    • Proposed as answer by Tiya01 Monday, January 09, 2012 10:44 AM
    Tuesday, June 19, 2007 10:54 AM
  • Considering the DataTable method is putting most of the burden on the client and getting all the data in 'one fell swoop' that is probably more efficient.

     

    I would recommend reading something like http://msdn.microsoft.com/msdnmag/issues/05/01/MemoryOptimization/  and just try to optimize the process/memory management as much as possible once your data is read. I'm not sure how it is in your friend's office, but around here, Client resources are MUCH more readilly available than server resources and doing heavy calculation like this is not bad as long as it is reasonably optimized. I originally had a ~15 second load process, with our aging DB server. With alot of optimization, multithreading, and an upgrade to our DB server the reload process appears to be less than half a second now.

     

    I think that getting all or at least a large amount of the data in one pass may be more efficient, especially since it seems he needs all the data at once. Maybe work with the data in blocks and retrieve the data in a seperate thread while you are working on the first block of data. Again, in the office here almost no apps support multi-cpus/cores properly so you have alot of untapped resources. You could fire off a thread to retrieve block 2 as you begin processing block 1, then wait for the retrieving of block 2 to be finished before you process the next block.

     

    Does his calculation actually use all one million records? You should also look into breaking that data down into more useful partitions or creating summary data or etc. Just a thought!

     

    Tuesday, June 19, 2007 5:17 PM
  • Thanks for your reply posts. Your posts are really helpful. I m sorry for extending my doubt. Myfriend gets data from multiple table using joint and process it for some business logic. Those satisfy the businesslogic are written to CSV file. he said he is not aware of how many recs he is fetching from the database. But the number recs satisfying business logic are around 1 million. In the end he has 1 million records written to the CSV file. He has a class for writting CSV file which takes the datareader records and adds them to the arraylist. Then the data in the arraylist written in bulk to the CSV file. Because he has memory overflow issues, he adds only 500 recs at a time to the arraylist, then write them to CSV file , clears the arraylist and process next 500 and so on. Now the program runs for about 7 mins to complete. I m just wondering 7 mins is the accepted time delay in this scenario? or can this be optimized to give a better performance?
    I hope i m able to explain the scenario in a understandable way. If u have any doubts please point to me. Thanks for your help.
    Wednesday, June 20, 2007 6:54 AM
  • SQL Server has DTS (Data Transformation Services) that allow fast export of the data from one data source to another one. For example you could export data from multiple tables in SQL Server to one CSV file and apply some logic during transformation. Maybe this is the way it should be done, not through the looping.
    Wednesday, June 20, 2007 10:42 AM
  •  Matthew Galligan wrote:

     You could fire off a thread to retrieve block 2 as you begin processing block 1, then wait for the retrieving of block 2 to be finished before you process the next block.

     

    How would one go about doing this.  I am farily new to the whole threading scene.  I had tried to implement threading in a project of mine but it just slowed it down.

     

    Visual Studion 2005, .NET Framework 2.0, VB.NET

    Tuesday, July 03, 2007 2:48 PM
  • Threading does not help in all the cases and, actually, could slow down the whole process. I believe multithreading is not a solution in your specific case. Have you tried to use DTS?

    Thursday, July 05, 2007 10:35 AM
  • I am not too familiar with DTS, but doesn't it move data from a server to  a server. This is not what I need done. The following outlines the process.

     

    I am pulling the data from two different servers in two different databases:

     

    Server 1: Database A : Table a.1

    Server 2: Database B : Table b.1, Table b.2

     

    and processing on the client side.  I have functionality done, the process time is just too slow.  I have been reading the forums, the MSDN library and I have increased performance just not enough.

     

    Lets say I have 50,000 records for any given day, all three tables have a record referencing each other, and the order in each table might not be equivilant, thus 150,000 records exist for me to look at If you can follow this may be you can help

     

     

    Process:

     

    STEP 1: connect to database A, pull fields 1, 2, 3, 4.  from that data create a string "strX"  strx = string.format("{0} {1} - {2} - {3}", 1,2,3,4) 

     

    STEP 2: connect to database b, table b.1,  pull record, compare record to strX.  loop until found, I have tried quering the server to just pull the record, but that takes twice as long. once found get "field_idnum"

     

    STEP 3: connect to database b, table b.2, pull record compare record fields to field_idnum, loop until found, I have tried quering still slower.  if match found then

     

    STEP 4: compare DatabaseA.Tablea.1.Field_Count  to  DatabaseB.Tableb.2.Field Count

                            if equal then set DatabaseA.Tablea.1.Field_X = "SET" else .Field_X = "NOT_SET"

     

    then after that I exit all loops so it take up less time, also since I am useing data tables for the second two I remove the records I found to shrink the data i need to search through, hopfully making it faster. any way back to the to process the 150,000 take 1 hour 20 minutes at the fastest I have found

     

    I have ran it in release mode, I use datareader for database A. table a.1, and I use datatablereaders for the other two.   I am not asking you to do my project but if you have any tips or documents on how I can make it faster, I might be able to adjust some database settings but I am not the DBA, so I would need some documentation on why the change needs to be made.  I am using VB.NET with VS 2005 .NET 2.0 

     

     

     

    Thursday, July 05, 2007 1:50 PM
  • Can somebody give a general idea whether I should use datareader or DataTable in the following situation:

    I need to read data from database, do some sort of business logic and validations on top of it and send it back to some caller (through a WCF interface).

    Regards,
    Kangkan
    http://www.geekays.net/

    Saturday, August 30, 2008 10:12 AM
  • In this case you would need to use DataTable. It would allow you to cache data in your application and manipulate with it. In a case of DataReader you do not have possibility to keep data unless you start to copy it from DataReader into some sort of storage

     

    Sunday, August 31, 2008 12:38 PM
  • While I am not trying to reawaken this thread,

    The answer to this query should be recorded for posterity

    Step 2: in normal circumstances a query should be faster than a loop simply
    because a DB is optimised for data processing... the problem here is likely that the database
    needs an index on the field being compared to strX

    Step 3: similarly record fields to be compared to field_idnum should be indexed.
    Stored procedures may be a better option to this if the field_idnum is being compared to multiple
    fields on the table...


    Using SQL with the proper indexes applied rather than loops should have this
    taking 10 mins or so, rather than 1h 20m currently...
    Friday, September 11, 2009 9:11 AM
  • As performance wise datareader improves the perfrmance as compared with dataset

    http://www.dotnetquestionanswers.com
    Tuesday, January 24, 2012 1:51 AM
  • Logically, with a good design of the application, you would apply business logic against the data in a batch SQL statement(s) to avoid retrieval of the data completely or to retrieve only the data that match specific condition in a WHERE clause, not all the rows. In some cases it is not possible, in some it is. It depends on what you need to achieve. In a case of the DataReader, yes, it keeps connection opened and provides fast read-only, forward-only cursor and I believe it caches some data unless you specify SingleRow option when you call ExecuteReader method to create it. But you do not have much of the control over this cursor.

    By "you do not have much control over the cursor", did you mean that you cannot move the cursor backwards ? 
    Wednesday, January 22, 2014 5:52 PM
  • In this case you would need to use DataTable. It would allow you to cache data in your application and manipulate with it. In a case of DataReader you do not have possibility to keep data unless you start to copy it from DataReader into some sort of storage

     

    What would be a good way to store data of a dataReader for processing ? 
    Wednesday, January 22, 2014 6:10 PM