locked
OLE DB Destination - Fast Load with Maximum Insert Commit Size RRS feed

  • Question

  • I'm seeing some strange behavior from the OLE DB Destination when using the "fast load" access mode and setting the "Maximum insert commit size".

    When I do not set the "Rows per batch" or the "Maximum insert commit size", the package I'm working with inserts 123,070 rows using a single "insert bulk" statement.  The data seems to flow through the pipeline until it gets to the OLE DB Destination and then I see a short pause.  I'm assuming the pause is from the "insert bulk" statement handling all of the rows at once.

    When I set the "Rows per batch" option but leave the "Maximum insert commit size" alone, I generally see the same behavior -- a single "insert bulk" statement that handles all 123,070.  In this case, however, the "insert bulk" statement has a "ROWS_PER_BATCH" option appended to the statement that matches the "Rows per batch" setting.  This makes sense.  I'm assuming the "insert bulk" then "batches" the rows into multiple insert statements (although I'm unsure of how to confirm this).  This version of the "insert bulk" statement appears to run in about the same time as the case above.

    When I set the "Maximum insert commit size" option and leave the "Rows per batch" statement alone, I see multiple "insert bulk" statements being executed, each handling the lower of either the value I specify for the "Maximum insert commit size" or the number of rows in a single buffer flowing through the pipeline.  In my testing, the number of rows in a buffer was 9,681.  So, if I set the "Maximum insert commit size" to 5,000, I see two "insert bulk" statements for each buffer that flows into the OLE DB Destination (one handling 5,000 rows and one handling 4,681 rows).  If I set the "Maximum insert commit size" to 10,000, I see a single "insert bulk" statement for each buffer that flows into the OLE DB Destination (handling 9,681 rows).

    Now the problem.  When I set the "Maximum insert commit size" as described in the last case above, I see LONG pauses between buffers being handled by the OLE DB Destination.  For example, I might see one buffer of data flow through (and be handled by one or more "insert bulk" statements based on the "Maximum insert commit size" setting), then see a 2-3 minute pause before the next buffer of data is handled (with its one or more "insert bulk" statements being executed).  Then I might see a 4-5 minute pause before the next buffer of data is handled.  The pause between the buffers being passed through the OLE DB Destination (and handled via the "insert bulk" statements) is sometimes shorter, sometimes longer.

    Using Profiler, I don't see any other activity going on within the database or within SQL Server itself that would explain the pauses between the buffers being handled by the OLE DB Destination and the resulting "insert bulk" statements...

    Can anyone explain what is going on here?  Is setting the "Maximum insert commit size" a bad idea?  What are the differences between it and the "Rows per batch" setting and what are the recommended uses of these two options to try to improve the performance of the insert (particularly when handling millions of rows)?

    TIA for any thoughts or information...

    Dave Fackler

    Friday, September 8, 2006 2:54 PM

All replies

  • Have you received any feedback on this? I would also appreciate the clarification.

     

    Friday, October 27, 2006 4:19 PM
  •  LarryETL wrote:

    Have you received any feedback on this? I would also appreciate the clarification.

     

     

    Me too. This is really intricate stuff and I'd love to see a whitepaper that explores these settings deeper. The existing performance tuning whitepaper, whilst being an excellent paper, doesn't explore these issues fully.

     

    -Jamie

     

    Friday, October 27, 2006 7:12 PM
  • Well, the Maximum insert commit size (MICS) controls how much data is committed in its batches.

    So, if you have MICS set to 5000 and you have 9000 rows and you encounter an error in the first 5000 results, the entire batch of 5000 will be rolled back.  MISC equates to the BATCHSIZE argument in the BULK INSERT transact-sql command.

    Rows Per Batch (RPB) is merely a hint to the query optimizer.  The value of this should be set to the actual expected number of rows.  RPB equates to the ROWS_PER_BATCH argument to the BULK INSERT transact-sql command.


    http://msdn2.microsoft.com/en-us/library/ms188365.aspx

    Specifying a value for the MICS will have a few effects.  Each batch is copied to the transaction log, which will cause it to grow quickly, but offers the ability to back up that transaction log after each batch.  Also, having a large batch will negatively affect memory if you have indexes on the target table, and if you are not using table locking, you might have more blocking going on.
    Friday, October 27, 2006 7:20 PM
  • Does anyone know if setting Maximum Insert Commit Size (MICS) to a non-zero value on a Fast Load OLE DB Destination will completely rollback even after a commit is taken when using package Transactons (TransactionOption=Required).  I have a scenario were I need to fast load multiple tables in one transaction so that if any failure occurs, all tables are rolled back.  I want to know if setting the MICS to a value other than zero defeats the purpose of Transactions.
    Wednesday, February 28, 2007 7:13 PM
  • Hi,

    How to get the number of rows in a buffer? Thanks.

     

    Wednesday, February 13, 2008 8:50 PM
  •  Jerry9988 wrote:

    Hi,

    How to get the number of rows in a buffer? Thanks.

     

     

    What are you going to do with that information?  By default, SSIS processes buffers in 10,000 row chunks (though is realistically less).

    Wednesday, February 13, 2008 10:53 PM
  •  Jerry9988 wrote:

    Hi,

    How to get the number of rows in a buffer? Thanks.

     

     

    You need to turn on the BufferSizeTuning event in the Package Logging options. You have to actually select the data flow in the Logging dialog, then go to Details, and select that event. Then the number of rows per buffer will be reported in the log.

    Thursday, February 14, 2008 12:10 AM
  • I am attempting to load 17 million records into a table and if I don't set the Rows per Batch or the Maximum Insert Commit Size I end up causing a problem for an application that uses this table.  It is unable to access records while the thing is commiting.  If I understand what you're saying then I should set the Rows per Batch to 17000000 to match the rows in the input and some Maximum Insert Commit Size.  Is that correct and roughly what value for MICS would one set for a data load like this?  What order of magnitude?  10, 100, 1000 etc?
    Thursday, December 18, 2008 12:04 AM