none
out of memory exception ado.net dataset RRS feed

  • Question

  • I have isolated an out of memory exception as an ADO.NET issue and not part of my code.
    But, I’m still not convinced that it’s a hard limit; it could be a bug in ADO.NET.

    I did review the article that discussed ADO.NET memory use and the ability to page to disk. http://articles.techrepublic.com.com/5100-10878_11-1050422.html
     
    Unfortunately, his test logic was flawed in the parameters or he just didn’t do enough testing.

    Some paging does occur, but I don’t have to wait hours to fill up the memory; I can do it in a matter of minutes, with or without large page files.

    Intially, I couldn't isolate the line of code in my application that was causing the out of memory exception, once I suspected it to be the ADO.NET DataSet.Fill Method, I decided to build a new project that was very crude and basic. It just dumps data into a DataSet using the Fill method.

    All you have to do is provide a connection string and a SQL statement. I’ve found some interesting results.

    Not surprisingly, the number of DataColumns makes a big difference in the number of DataRows you can load into the DataSets’  DataTable.


    I track this number in terms of "DataElements"

    In general, you can get about:
     
    75 DataColumns x 500,000 DataRows
    50 DataColumns x 750,000 DataRows
    25 DataColumns x 1,500,000 DataRows
    10 DataColumns x 3,750,000 DataRows

    OR about 37,500,000 DataElements of varying data types before getting an out of memory exception.

    You can push it further than that, but that’s about as far as you can go and have the application remain stable. I wouldn’t recommend going much higher.

    I was able to get close to 50,000,000 unique elements during a few runs, but I was using very basic data types and no long strings.

    The upper limit run stats for the “Closest to the Out of Memory Exception without going over” AKA “The Price is Right limit” was:

    1,775,031 rows x 28 columns = 49,700,868 data elements (mainly integer data types with no long strings)
    1,480,000 rows x 29 columns = 42,920,000 data elements (swapped out some columns with long string data, such as account descriptions and segment descriptions)

    So you can see the variations begin to crop up with just a few changes to the schema.

    The ADO.NET DataTable documentation states “The maximum number of rows that a DataTable can store is 16,777,216”
    This is why I think this might be a bug in ADO.NET, unless this number is arbitrary or a factor of having only two to three DataColumns in the DataTable.

    I am going to try changing the project to target the .NET Framework 4.0 -
    My understanding is that an updated ADO.NET 4.0 might have some performance gains and possibly some bug fixes;
    It’s worth a shot to see if this exception is resolved.

    There’s plenty of page file and plenty memory left on these boxes, in some cases, so I can’t see why this error would be normal. I've tested this sample application on Windows XP, Windows Vista and Windows 7 with varying memory sizes (1024, 2048, 4086 MB of memory) and with the exception of the 1024 MB, the amount of memory used by the sample application never rises above 1.41 - 1.44 GB of memory utilized.

    Of course, the limits I have described at this point are just for one DataSet containing one DataTable, I haven’t checked what happens when you add multiple DataTables that encroach on the limits described above.

    In my test on a box with 4GB of memory, there was almost 25% of the physical memory left when the out of memory exception occurred, so there’s a chance the limit is per DataSet object instance or just a DataTable object instance limit. In addition, there was almost 3500 MB of page file left. (2500/6038 utilized)

    The code is very basic and I have included it in this post. The only thing missing is that you need to create a windows form with a textbox for your connection string and your SQL.  If you have a connection string and a can craft a SQL statement that returns over 50,000,000 unique elements, you should be able to reproduce the out of memory exception on just about any 32-bit box with 4GB of ram or less (I haven’t tried 64-bit yet).

    I know the question will come up, "Why are you even trying to do this?". Trust me, I have very good reasons for trying push these limits.
    Also, there should be some documentation that clearly defines the performance and size parameters of an ADO.NET DataSets' capabilities, so why not start now?

    Imports System
    Imports System.Data
    Imports System.Data.SqlClient
    Imports System.Data.Sql
    
    Public Class Form1
    
      Private ds As DataSet
    
      Public Function CreateDataSet(ByVal strDataConnection As String, ByVal strSQL As String)
    
        Try
    
          Dim lcnn As New SqlConnection()
    
          Dim lda As New SqlDataAdapter()
    
          Dim lcmd As New SqlCommand()
    
    
          lcnn.ConnectionString = strDataConnection
    
          lcmd = lcnn.CreateCommand
          lcmd.CommandText = strSQL
          lcmd.CommandTimeout = 60
          lda.SelectCommand = lcmd
    
          ds = New DataSet
          lcnn.Open()
          lda.Fill(ds, "TempDS")
    
          lcnn.Close()
    
          Dim count As Integer = ds.Tables(0).Rows.Count * ds.Tables(0).Columns.Count
    
          MsgBox("I did it! There are " & ds.Tables(0).Rows.Count & " rows and " & ds.Tables(0).Columns.Count & " columns in me for a total of " & count & " unique elements in memory!!", MsgBoxStyle.Exclamation)
    
        Catch ex As Exception
    
          MsgBox("I failed :( with: " & ex.Message, MsgBoxStyle.Critical)
    
        End Try
      End Function
    
      Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        CreateDataSet(Conn.Text, SQL.Text)
    
        'doesn't clean up any memory
        'ds.Clear()
        'GC.Collect()
    
        'doesn't clean up any memory
        'ds.Dispose()
        'GC.Collect()
    
        'this one works!
        ds = Nothing
        GC.Collect()
    
        MsgBox("I'm all cleaned up!", MsgBoxStyle.Exclamation)
    
      End Sub
    End Class
    















    • Edited by Jordan B Friday, May 21, 2010 7:59 AM code got reformatted after edits
    Friday, May 21, 2010 7:42 AM

Answers

  • Hi Jordan,

     

    I also want to mention that, DataSet.Clear() and DataSet.Dispose() won’t make the rows cleared by GC.   .Clear() method will remove all the data rows in every datatable inside the dataset.   However, it does not remove the tables and relationships.   Also, the .Dispose() method is implemented by the base class of the DataSet MarshalByValueComonent, so it does not release the managed resources of dataset.   

     

    To let GC clear all the datatable and datarows, please set the ds to null so all the managed resource of the original dataset are not referred and GC can finally clear them all.   Another method is using DataSet.Reset, so the dataset will be at the initialized status without any tables or relationships added.  

     

     

    Best Regards,
    Lingzhi Sun

    MSDN Subscriber Support in Forum

    If you have any feedback on our support, please contact msdnmg@microsoft.com


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Friday, May 21, 2010 9:19 AM
    Moderator
  • Hi Jordan,

     

    Exactly.   I think the limitation (16,777,216) of the datatable rows is just maximum row number by design.   It’s the ideal situation when we have enough memory to allocate for the rows.   However, the OutOfMemory exception is thrown because we have no user-mode memory (2GB in 32bit Windows) to allocate.   Also, for .NET apps, we don’t get all the 2GB memory since other components like CLR or other fragments exist in the memory.  

    http://tonesnotes.com/blog/2005/10/avoiding_out_of_memory_excepti.html

    http://stackoverflow.com/questions/597499/why-am-i-getting-an-out-of-memory-exception-in-my-c-application

     

    So for 32bit Windows, it is recommended to open the 3GB or /LARGEADDRESSAWARE switches. 

     

    For 64bit Windows, we can use much more user-mode memory.  

     

    Hope it is helpful to you!

     

    Have a nice weekend!

     

     

    Best Regards,
    Lingzhi Sun

    MSDN Subscriber Support in Forum

    If you have any feedback on our support, please contact msdnmg@microsoft.com


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Friday, May 21, 2010 9:01 AM
    Moderator
  • Hi Jordan,

     

    Thanks for your following up!   For the AWE question you raised, I think we cannot use AWE in .NET (managed codes).   DataSet/DataTable is managed object which is stored in the managed heap.   We cab P/Invoke the AWE APIs in C# or VB.NET, however, I don’t think we can control the managed heap allocation.  The managed memory is all controlled by CLR and released by GC.   

     

    I would recommend you use 64bit Windows.  J   Have you tested the app on 64bit OS yet?  

     

    Have a nice day!

     

    Best Regards,
    Lingzhi Sun

    MSDN Subscriber Support in Forum

    If you have any feedback on our support, please contact msdnmg@microsoft.com


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    • Marked as answer by Jordan B Wednesday, August 10, 2011 3:51 AM
    Tuesday, May 25, 2010 6:20 AM
    Moderator

All replies

  •  

    Just another correction to my above post: The Fill Method is part of the SqlDataAdapter class and not the Dataset class. Which makes me wonder if I was to add 16,777,216 rows by adding DataRow to a DataTable instead of using the Fill method of the SqlDataAdapter, would that make a difference?

    Friday, May 21, 2010 8:03 AM
  • Hi Jordan,

     

    Exactly.   I think the limitation (16,777,216) of the datatable rows is just maximum row number by design.   It’s the ideal situation when we have enough memory to allocate for the rows.   However, the OutOfMemory exception is thrown because we have no user-mode memory (2GB in 32bit Windows) to allocate.   Also, for .NET apps, we don’t get all the 2GB memory since other components like CLR or other fragments exist in the memory.  

    http://tonesnotes.com/blog/2005/10/avoiding_out_of_memory_excepti.html

    http://stackoverflow.com/questions/597499/why-am-i-getting-an-out-of-memory-exception-in-my-c-application

     

    So for 32bit Windows, it is recommended to open the 3GB or /LARGEADDRESSAWARE switches. 

     

    For 64bit Windows, we can use much more user-mode memory.  

     

    Hope it is helpful to you!

     

    Have a nice weekend!

     

     

    Best Regards,
    Lingzhi Sun

    MSDN Subscriber Support in Forum

    If you have any feedback on our support, please contact msdnmg@microsoft.com


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Friday, May 21, 2010 9:01 AM
    Moderator
  • Hi Jordan,

     

    I also want to mention that, DataSet.Clear() and DataSet.Dispose() won’t make the rows cleared by GC.   .Clear() method will remove all the data rows in every datatable inside the dataset.   However, it does not remove the tables and relationships.   Also, the .Dispose() method is implemented by the base class of the DataSet MarshalByValueComonent, so it does not release the managed resources of dataset.   

     

    To let GC clear all the datatable and datarows, please set the ds to null so all the managed resource of the original dataset are not referred and GC can finally clear them all.   Another method is using DataSet.Reset, so the dataset will be at the initialized status without any tables or relationships added.  

     

     

    Best Regards,
    Lingzhi Sun

    MSDN Subscriber Support in Forum

    If you have any feedback on our support, please contact msdnmg@microsoft.com


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Friday, May 21, 2010 9:19 AM
    Moderator
  • Hi Lingzhi,

    Thanks for your reply.

    I think I understand more than I ever wanted to know about memory and the operating system!

    I repeated my tests with the "/3GB" option in the boot.ini (Windows XP).

    On Vista and Windows 7, I was able to use the "bcdedit.exe /set IncreaseUserVa 3072".

    Then I used the "editbin.exe /LARGEADDRESSAWARE myapplication.exe" from a Visual Studio command prompt to enable the application to use the additional 1GB of virtual address space (VAS).

    Then, I used the "DUMPBIN /HEADERS myapplication.exe" from a VS command prompt to verify the application was indeed enabled with LARGEADDRESSAWARE.
    This is crucial to check because I thought I had enabled it once during my tests, but actually the setting had failed.

    So using SimpleDataSetFill application (my code in the previous posts), I can now load DataSets into memory at almost twice the capacity of my previously reported DataElement threshold (10 DataColumns x 3,750,000 DataRows = 37,500,000 Data Elements)

    My best run was  41 DataColumns x 1,770,685 DataRows for a total of 72,598,085 DataElements!!!

    This consumed about 2.5 GB of RAM, all physical RAM, but this could also be achieved with page files on a box with only 2GB of RAM (sounds weird I know, but it works!)

    Granted, this set-up is a bit like over clocking your hardware -  It’s not a recommended set-up, but certainly works in a test environment.
    I haven’t experienced any direct instability, but the general consensus is that you are more prone to lock-ups and blue screens. 

    So with this set-up the new DataSet size limits would be approx:

    75 DataColumns x 933,333 DataRows
    50 DataColumns x 1,400,000 DataRows
    25 DataColumns x 2,800,000 DataRows
    20 DataColumns x 3,500,000 DataRows
    10 DataColumns x 7,000,000 DataRows

    OR about 70,000,000 data elements of varying data types before getting an out of memory exception.

    The 2.5GB of memory consumption is expected because you still have the other .5 GB taken up by CLR and other user-mode overhead. The remaining 1GB of physical RAM is reserved for the operating system kernel.

    So now that I understand from a conceptual standpoint how 32-bit systems interact with 4GB of memory, my understanding is that I could use PAE in combination with AWE to unlock more memory above 4GB and up to 16GB (if I keep the /3GB option enabled, up to 64GB if turn off the /3GB option).

    So, how can I use AWE in .NET?

    I see an example of AWE in C++ http://msdn.microsoft.com/en-us/library/aa366531(VS.85).aspx 
    but it looks like it's not supported in VB.NET or C#.

    I guess I could call the AWE/VirtualAlloc functions in C++ from VB.NET in my sample DataSet application.
    All I really need to do is tell AWE that my process needs more memory and maybe that's enough to push this even further.

    So as you said, beyond this, it's 64-bit or bust?

    Thanks again for your help on this issue. I think my next step is to compile this sample application for 64-bit and then test it again. I think I have a box that's 64-bit with 16GB of RAM, but it may take a few weeks to get dedicated access. I'll post my 64-bit test results as soon as I can.

    Jordan

    • Edited by Jordan B Sunday, May 23, 2010 8:45 AM add info
    Sunday, May 23, 2010 8:42 AM
  • Hi Jordan,

     

    Thanks for your following up!   For the AWE question you raised, I think we cannot use AWE in .NET (managed codes).   DataSet/DataTable is managed object which is stored in the managed heap.   We cab P/Invoke the AWE APIs in C# or VB.NET, however, I don’t think we can control the managed heap allocation.  The managed memory is all controlled by CLR and released by GC.   

     

    I would recommend you use 64bit Windows.  J   Have you tested the app on 64bit OS yet?  

     

    Have a nice day!

     

    Best Regards,
    Lingzhi Sun

    MSDN Subscriber Support in Forum

    If you have any feedback on our support, please contact msdnmg@microsoft.com


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    • Marked as answer by Jordan B Wednesday, August 10, 2011 3:51 AM
    Tuesday, May 25, 2010 6:20 AM
    Moderator
  • Hi Jordan,

     

    I am writing to check the status of the issue on your side.  Would you mind letting us know how is the problem now? 

     

    If you need further assistance, please feel free to let me know.   I will be more than happy to be of assistance.

     

    Have a nice day!

     

     

    Best Regards,
    Lingzhi Sun

    MSDN Subscriber Support in Forum

    If you have any feedback on our support, please contact msdnmg@microsoft.com


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Monday, May 31, 2010 1:17 AM
    Moderator
  • Belated reply, but I did migrate to 64-bit OS and it has resolved all of my memory issues, for the most part. I think the missing link for me in the discussion was the idea of contiguous memory addressing. I had assumed that even a machine with the max amount of memory in 32-bit would be enough to store a single datatable in a dataset with over a million rows, little did I realize how big datasets are compared to binary streams from a sql adapter and how easily it was on a machine (even with kernel mode and user mode services and applications kept to minimum) to run out of a 1GB or larger available contiguous block of memory. Using 64-bit machines, where there is 8gb to 32gb of memory on most machines, it has made all the difference. On 32-bit machines, I am able to "chunkify" the data into smaller datatables where the threat/likelihodd of a lack of contiguous memory addressing is common.
    Wednesday, August 10, 2011 4:00 AM