locked
Load and manipulate large amount of data RRS feed

  • Question

  • hi all.
    I have to load lots of data (roundabout 100 000 000 rows) from one database (DB2) manipulate it and put it back to SQL Server.

    How would you do it?

    Load it into a Dataset, manipulate it (within a for each loop) and use bulkcopy to copy it into a SQL Server table?
    or load it with a datareader, manipulate it ( within the for each loop) there and use and INSERT INTO command?

    How would you load the data?
    How would you reduce the risk of an out of memory error when loading millions of data to a dataset?

    thanks
    Friday, January 1, 2010 7:11 PM

Answers

  • OK, you can do it by paging data on a specific unique column like following:
    for(int code=0;code<MAX_POSSIBLE_VALUE;code+=STEP)
    {
    yourTableAdapter.SelectCommand.CommandText="select * from x where code>="+code+" and code<"+(code+STEP);
    yourTableAdapter.Fill(yourDataSet);
    //
    // TODO: update the dataset as you wish here
    //
    yourTableAdapter.Update(yourDataSet);
    }

    With best regards, Yasser Zamani
    • Proposed as answer by Levi Domingos Friday, January 1, 2010 7:45 PM
    • Marked as answer by GMS0012 Friday, January 1, 2010 9:02 PM
    Friday, January 1, 2010 7:36 PM
  • thank you
    • Marked as answer by GMS0012 Friday, January 1, 2010 9:00 PM
    Friday, January 1, 2010 7:41 PM

All replies

  • (roundabout 100 000 000 rows) ...that is too much data for you to manipulate...that will take many hours to load it....you should use query first and manipulate after finding the record u want to change....or you use XML...
    Just Be Humble Malange!
    Friday, January 1, 2010 7:14 PM
  • it is part of a data migration tool.
    if have to change every row !

    like adding some letters to the value of a column..

    like "GER" for germany customers,
    like "UK" for UK customers.

    like adding a value to a number.. and so on...

    I can not do it in an UPDATE statement of the SQL Server. I need to do it with the "string" I have in the dataset / datereader

    that it will run very long is not the problem... it is normal when to manipulat that amount of data.
    the thing is how to do it best...

    load it in a memery limited dataset (like 500 mb), manipluate it and store it. this in a loop until the source data is finised?

    Friday, January 1, 2010 7:19 PM
  • Do you want to manipulate all of them?
    If yes, i think you want to do special update on all of them so do it in an update statement with applyed roll-back in unsuccessful operation.
    If no, load only required information not all of them.
    Please tell more abou your manipulation
    With best regards, Yasser Zamani
    Friday, January 1, 2010 7:22 PM
  • i need to update all of them,.

    and I cant do it in an UPDATE statement,.
    I have some compilcate updato rules so i have to update the dataset value and AFTER this I can save it to the DB.

    2 reaons why I have to do it in code:

    1. the SQL SERVER is not always the source system. So there are differences in SQL.
    2. the update rules can not be handeled in SQL . it must be C#
    Friday, January 1, 2010 7:26 PM
  • OK, you can do it by paging data on a specific unique column like following:
    for(int code=0;code<MAX_POSSIBLE_VALUE;code+=STEP)
    {
    yourTableAdapter.SelectCommand.CommandText="select * from x where code>="+code+" and code<"+(code+STEP);
    yourTableAdapter.Fill(yourDataSet);
    //
    // TODO: update the dataset as you wish here
    //
    yourTableAdapter.Update(yourDataSet);
    }

    With best regards, Yasser Zamani
    • Proposed as answer by Levi Domingos Friday, January 1, 2010 7:45 PM
    • Marked as answer by GMS0012 Friday, January 1, 2010 9:02 PM
    Friday, January 1, 2010 7:36 PM
  • thank you
    • Marked as answer by GMS0012 Friday, January 1, 2010 9:00 PM
    Friday, January 1, 2010 7:41 PM
  • You're welcome ;-)
    With best regards, Yasser Zamani
    Friday, January 1, 2010 7:43 PM
  • I'd probably import it into a DB2 databse into a temporary database in sql server and use stored procedures to do the data conversion you want and insert them into their destination tables.  SQL Server will process large chunks of data significantly faster then fetching a record doing some processing and writing it back using an application.
    Friday, January 1, 2010 8:49 PM
  • hi Ray,

    you are right

    but my solution should not depend on one DB system.

    It must also be possible to import a mysql table or access table and store it on db a DB2 or oracle.

    so the update logic can not be in a SQL SERVER stored procedure although it is perhaps faster..
    I also thought about this solution but the source and target DB are not fix. so the logic part can not be in the DB.
    I hope you understand this.
    Friday, January 1, 2010 8:57 PM
  • If the source and destination are different you can also use a data reader, do one big select and read the rows one by one, updating them on the fly with individual update statements. This might cause problems if you want to update them back to the same db/table though.
    Friday, January 1, 2010 9:52 PM