none
Huge amount of data holding data structure in .Net RRS feed

  • Question

  • Hi Guys,

    Do we have BULK INSERT of SQL Server in ORACLE? If not, then how would .Net 4.0 framework support huge data insert into an Oracle database table, all at once (no multiple inserts)

    Also holding this much data in any CLR structure (such as StringBuilder or XMLDocument) would result in memory overflow? How can we solve such situation then? I know StringBuilder has some limitation (as discussed here http://social.msdn.microsoft.com/Forums/en-US/Vsexpressvb/thread/968ce154-0cfc-4d57-98f7-61c9886cefd4/) but even XMLDocument too have the same?

    Again the requirement is that a 100 GB file can be read for only once, as whole, and and all file data must reside in the memory. I need to find out a suitable .Net Data structure to hold this huge amount of data in memory by managing virtual memory and page swaps efficiently.


    • Edited by Vinit Sankhe Wednesday, June 1, 2011 3:49 AM Topic of same name found!
    Wednesday, June 1, 2011 3:48 AM

Answers

  • Hi CaptainKernel,

    Thx for you reply... My RAM is 4 GB. I am using Windows XP 32 bit. The objects are simple. All string attributes (coz they are just been mapped from a file row) and belong to the same type of class. The records are in a few billions.  Overall size of the text file is 100 GB. This problem is about load testing and perfromance improvement. There is no real aspect to this. You can say just a self home work. :-)

    Data has to be held in memory at once then I want to insert the records in a Oracle table. Again I dont want billion's of inserts as that would be impractical. Even if I split it in groups, the optimum will be 32000 inserts in 32000 distinct SQL blocks. Not efficient at all.

    Questions...

    1. Which CLR data structure can hold such huge data in memory and perform page swapping efficiently?

    2. Sql Loader utility in ORACLE can be used in this task but how does .net API work with it?


    Are you forced to use a relational database?

    Why have all that data accessible in memory, then go to the trouble of storing it in a relational DB?

    If you were using SQL Server you might be able to create some SQL/CLR code (that runs inside SQL Server) to somehow do the load but I don't think it would be faster really.

    Cap'n

     

    • Marked as answer by eryang Tuesday, June 14, 2011 3:09 AM
    Thursday, June 2, 2011 7:31 PM
  • I'm afraid we cannot keep 100GB data in memory, because Virtual Memory size of 32bit process is only 4GB, you may want to move towards 64bit.

     

    Paging is controlled by underlying OS, not CLR, so you can just choose data structure as your convenient.


    Eric Yang [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    • Marked as answer by eryang Tuesday, June 14, 2011 3:09 AM
    Thursday, June 2, 2011 4:02 AM

All replies

  • There is no database support in .Net BCL. You probably want to visit the ADO.Net forums under the Data Platform Development category,

    The following is signature, not part of post
    Please mark the post answered your question as the answer, and mark other helpful posts as helpful, so they will appear differently to other users who are visiting your thread for the same problem.
    Visual C++ MVP
    Wednesday, June 1, 2011 4:19 AM
  • Hi Guys,

    Do we have BULK INSERT of SQL Server in ORACLE? If not, then how would .Net 4.0 framework support huge data insert into an Oracle database table, all at once (no multiple inserts)

    Also holding this much data in any CLR structure (such as StringBuilder or XMLDocument) would result in memory overflow? How can we solve such situation then? I know StringBuilder has some limitation (as discussed here http://social.msdn.microsoft.com/Forums/en-US/Vsexpressvb/thread/968ce154-0cfc-4d57-98f7-61c9886cefd4/) but even XMLDocument too have the same?

    Again the requirement is that a 100 GB file can be read for only once, as whole, and and all file data must reside in the memory. I need to find out a suitable .Net Data structure to hold this huge amount of data in memory by managing virtual memory and page swaps efficiently.



    Do you want to keep 100GB of data in memory?

    How much physical memory do you have?

    Can you shed some light on the nature of the data? is it millions of small objects all the same type or is it many different types?

    What do you intend to do with the data at runtime? read it? update it?

    Can you tell me what is the overall application domain? games? derivatives trading??? 

    Cap'n

     

     



    Wednesday, June 1, 2011 10:30 PM
  • Hi CaptainKernel,

    Thx for you reply... My RAM is 4 GB. I am using Windows XP 32 bit. The objects are simple. All string attributes (coz they are just been mapped from a file row) and belong to the same type of class. The records are in a few billions.  Overall size of the text file is 100 GB. This problem is about load testing and perfromance improvement. There is no real aspect to this. You can say just a self home work. :-)

    Data has to be held in memory at once then I want to insert the records in a Oracle table. Again I dont want billion's of inserts as that would be impractical. Even if I split it in groups, the optimum will be 32000 inserts in 32000 distinct SQL blocks. Not efficient at all.

    Questions...

    1. Which CLR data structure can hold such huge data in memory and perform page swapping efficiently?

    2. Sql Loader utility in ORACLE can be used in this task but how does .net API work with it?

    Thursday, June 2, 2011 2:44 AM
  • I'm afraid we cannot keep 100GB data in memory, because Virtual Memory size of 32bit process is only 4GB, you may want to move towards 64bit.

     

    Paging is controlled by underlying OS, not CLR, so you can just choose data structure as your convenient.


    Eric Yang [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    • Marked as answer by eryang Tuesday, June 14, 2011 3:09 AM
    Thursday, June 2, 2011 4:02 AM
  • Why not go for SSIS package. It is easy to operate bulk data with SSIS. You can execute SSIS package using command utitlity(dtexec) which you can run with system.diagnostics.process.start() method.

    http://msdn.microsoft.com/en-us/library/ms162810.aspx


    Thanks, Hitesh
    Thursday, June 2, 2011 8:19 AM
  • Hi CaptainKernel,

    Thx for you reply... My RAM is 4 GB. I am using Windows XP 32 bit. The objects are simple. All string attributes (coz they are just been mapped from a file row) and belong to the same type of class. The records are in a few billions.  Overall size of the text file is 100 GB. This problem is about load testing and perfromance improvement. There is no real aspect to this. You can say just a self home work. :-)

    Data has to be held in memory at once then I want to insert the records in a Oracle table. Again I dont want billion's of inserts as that would be impractical. Even if I split it in groups, the optimum will be 32000 inserts in 32000 distinct SQL blocks. Not efficient at all.

    Questions...

    1. Which CLR data structure can hold such huge data in memory and perform page swapping efficiently?

    2. Sql Loader utility in ORACLE can be used in this task but how does .net API work with it?


    Are you forced to use a relational database?

    Why have all that data accessible in memory, then go to the trouble of storing it in a relational DB?

    If you were using SQL Server you might be able to create some SQL/CLR code (that runs inside SQL Server) to somehow do the load but I don't think it would be faster really.

    Cap'n

     

    • Marked as answer by eryang Tuesday, June 14, 2011 3:09 AM
    Thursday, June 2, 2011 7:31 PM