none
Best way to import large amount of data to database RRS feed

  • Question

  • What is the best way or method to add and update large amount of data in a database? I'm importing large amount of data (about 100000 records) from a textfile which should end up in a database. Performance is an issue, it should take max about a few minutes. I have made several attempts with no luck, either I end up with horrible performance or I get errors when updating the database (duplicate primary key counters or foreign key constraint errors).

     

    A simplified version of my database:

     

    Locations:

    integer: iLocationId

    string: strLocationName

     

    Orders:

    integer: iOrderId

    integer: iLocationId

    string: strLocationName

     

    Imported data contains new and existing locations and orders. Existing data should be updated and new data should be added. I've tried something like this using tableadapters and datasets:

     

    Code Snippet

    MyDataSet ds = new MyDataSet();

    OrdersTableAdapter adapterOrders = new OrdersTableAdapter();

    adapterOrders.Fill(ds.Orders);

     

    /* add or update orders here */

     

    int iCount = adapterOrders.Update(ds.Orders);

     

    Is this a good way to do it? Of course I import Locations first and then Orders. Please let me know if there is another approach!

     

    Thanx.

    /jesper

    Thursday, September 20, 2007 2:35 PM

All replies

  • Rather than working with datasets I might work more directly with the database itself using stored procedures if possible.

     

    This p-code assumes you are more likely to update an existing record then insert a new one.  If this assumption is wrong you could adjust the code accordingly.

     

    Stored Proc Pseduo Code:

    Location_Upload(iLocationId, strLocationName)

    {

    // attempt to update

    Update Location set Name = @Name where Id = @Id;

     

    if records affected = 0

    {

    // not found so insert

    Insert Into Location(id, name) values(@Id, @name)

    }

    }

     

    Thursday, September 20, 2007 3:19 PM
  • Hi,

     

    I'd propose to use a stored procedure and a temporary table:

     

    1. Create a table called LocationTemp with the same columns as Location

    2. Load all data into that table

    3. Update all the rows in Location that have their matches in LocationTemp:

       UPDATE L

       SET L.Name = LT.Name

       FROM Location L INNER JOIN LocationTemp ON L.Id = LT.Id

    4. Insert all the remaining rows:

       INSERT INTO Location

           SELECT * FROM LocationTemp WHERE Id NOT IN (SELECT Id FROM Location)

     

    The statements above have not been tested and may not work right away, but I think you get the idea...

     

    Hope this helps,

    ab

    http://godevelop.blogspot.com

     

     

     

    Thursday, September 20, 2007 3:39 PM
  • I would use the BCP utility (http://msdn2.microsoft.com/en-us/library/ms162802.aspx).  So what you do is go from BCP to a staging table with the exact same schema as the source data (column names, etc).  Before updating the staging table, truncate it.  After you have the data in the staging table, then merge into your projection table (create a procedure to do this so you can easily reuse it).  This is the FASTEST way of doing this.  You won't be using transactions, so there is no rolling back (that's why you use the staging table).

     

    If this is a one time deal and you don't have data in your production table, you may be able to bypass the staging table.

     

    Also, I said merge intentionally.  That is a very neat way of doing things because it makes your SQL very simple.  It first tries an update if there is a match, but if there is no match it does an insert.  You can put this in a transaction if you must - and you probably should if going into production data.

     

    http://jdixon.dotnetdevelopersjournal.com/i_want_my_sql_2005_merge_statement.htm

     

    MERGE INTO MyTable
         USING MyTempTable
              ON MyTempTable.MatchingField1 = MyTable.MatchingField1
    WHEN MATCHED THEN
         UPDATE UpdateField1 = MyTempTable.UpdateField1
    WHEN NOT MATCHED THEN
         INSERT VALUES(MyTempTable.MatchingField1, MyTempTable.UpdateField1)

     

     

    The only other advice is to make sure you have your indexes tweaked correctly given that performance is an issue.  All this may seem difficult, but once you do it once it becomes quite simple.  And easily repeated for batch jobs and such.

    Saturday, September 22, 2007 12:07 AM
  • What type of database are you working with?

     

    Monday, September 24, 2007 1:07 PM
  • If you are using SQL Server then you could use built-in tools called DTS in SQL 2000 or Integration Services in 2005. They allow to import/export data between different data sources, including text files and work really fast since they use low level API and work with the bulk of data. In some cases you could cut hundreds times.

     

    Tuesday, September 25, 2007 10:54 AM
    Moderator