none
How to ensure to perform Bulk Insert? RRS feed

  • Question

  • I have modified a production code in a SP that was performing truncate and insert into a heap table to perform a bulk operation by adding:

    WITH (TABLOCK) hint, but it seems it is not doing bulk load and I see WRITELOG waits and log file ballooning.

    Is there way to see & ensure it does bulk insert/minimally logged operation?


    background info:

    Verison 2008 r2, Heap table, DB is in Simple recovery model

    Syntax used:

    insert 
       INTO  dbo.Tb_Retails WITH (TABLOCK)
       SELECT [MBR_PID]
          ,[MBR_....
        FROM mydb.dbo.Tb_Retail_stg2


    Mahesh

    Wednesday, September 21, 2016 12:34 AM

Answers

  • >why the heck the DB backup operation changed the performance from bulk load to fully logged operation???

    Because SQL Server has online backups.  You don't have to offline or quiesce the database during the backup. 

    The way SQL Server makes this possible is that the backup file contains copies of the database pages taken at different points in time during the backup, and it also contains all the log records written while the backup was running.  So if a page was changed after it was written to the backup device, it can be updated on restore to the point-in-time at which the backup completed.

    Now if a minimally-logged operation were allowed to write database pages that the backup has already read without writing log records for each row, that data wouldn't be anywhere in the backup file.  So SQL disables minimal logging during a backup.

    David


    David http://blogs.msdn.com/b/dbrowne/

    • Marked as answer by Mahesh Dasari Tuesday, September 27, 2016 6:27 PM
    Thursday, September 22, 2016 9:21 PM

All replies

  • Mahesh

    That pattern is not a BULK INSERT. You may get some gains from the locking hints but the command still has to write this to the transaction log.

    Try a SELECT  .... INTO  ... FROM .... pattern (which assumes that the destination table does NOT already exist) and compare the performance with that


    Martin Cairney SQL Server MVP

    Wednesday, September 21, 2016 1:37 AM
  • Martin,

    I was following the tip from this article..

    https://technet.microsoft.com/en-us/library/dd425070(v=sql.100).aspx


    Mahesh

    Wednesday, September 21, 2016 1:18 PM
  • What you posted is not "BULK INSERT".  Bulk insert is specifically from a file into a table. 

    The code you posted is going to do the entire select statement to tempdb, and then insert every row from tempdb to the destination table as a single transaction, which will be logged.  This is not optimal for large tables.

    Please describe exactly what you are trying to do. 

    It appears you are trying to copy data from a table in one database to another.  Are you copying all rows and columns?  Does the source table have a Primary Key?  How often does the destination table need to be refreshed?

    Wednesday, September 21, 2016 1:46 PM
    Moderator
  • Yes, trying to copy data from a table in one database to another, yes all rows, not all columns but there are 100+. It needs to be refreshed daily basis; source table do not have P key. The objective is to load the same at fastest way possible.  

    I can convert the code to do SELECT COL.. INTO MyTable FROM OtherTable - if that is a way?

    Thanks in adv..


    Mahesh

    Wednesday, September 21, 2016 3:09 PM
  • What you posted is not "BULK INSERT".  Bulk insert is specifically from a file into a table. 

     

    But in the article it is listed as one of  Bulk Load Methods !?

    Mahesh


    • Edited by Mahesh Dasari Wednesday, September 21, 2016 3:19 PM fix font size
    Wednesday, September 21, 2016 3:11 PM
  • You need to read the entire article to understand what it is talking about:

    In this paper , the term “bulk load” refers to the use of any of the methods described in this section. The term “BULK INSERT” (in upper case) refers to the specific Transact-SQL based bulk load method described under “BULK INSERT”.

    The section you are referencing specifically is talking about how to do "minimally logged insert operations".  That is a method to move lots of data from one table to another, but not the optimal method.

    Since your source does not have a primary key, your options are limited.  If it did, you could simply using Transactional Replication to move the data as it changed.

    The fastest method to move your data is likely to use SSIS and the OLEDB "fast load" and table lock option with a "insert maximum commit size" and "rows per batch" of a reasonable number like 50000.

    https://msdn.microsoft.com/en-us/library/ms139821.aspxThis will reduce the amount of log which is used and likely be the fastest.

    Wednesday, September 21, 2016 7:45 PM
    Moderator
  •  sorry, I misled the question with "BULK INSERT" term.

    My objective was to perform Bulk Load using the t-sql method with below syntax, whereby I gain the same performance as BULK INSERT/BCP or as SELECT INTO methods as described in the article. My doubt is what is that I missing here, why it is using trn log?

    INSERT INTO mydb.dbo.Tb_Retails WITH (TABLOCK) SELECT [MBR_ID] ,[MBR_.... FROM otherDb.dbo.Tb_Retail_stg2

    essentially I was using this tip:

    Heaps An INSERT statement that takes its rows from a SELECT operation and inserts them into a heap is minimally logged when a WITH (TABLOCK) hint is used on the destination table

    Thank you.


    Mahesh

    Wednesday, September 21, 2016 8:22 PM
  • Please post the results of SELECT @@VERSION.

    Wednesday, September 21, 2016 8:55 PM
    Moderator
  • Make sure the database are set in BULK_LOGGED Recovery Model.

    Microsoft Certified Trainer & MVP on SQL Server
    Please "Propose as Answer" if you got an answer on your question, and vote for it as helpful to help other user's find a solution on a similar question quicker.

    Wednesday, September 21, 2016 9:08 PM
  • First a little terminology.

    1) A "Bulk Insert" is any insert that doesn't insert the rows one-at-a-time.  BCP, BULK INSERT, and OleDB Fastload... are use special client/server APIs to transfer the data from the client to the server in bulk, instead of one row-at-a-time.  But under this definition INSERT ... SELECT is always a "Bulk Insert", as is SELECT ... INTO.

    2) A Bulk Insert may be "Fully Logged" or "Minimally Logged". A Fully Logged bulk insert has each inserted row written to the log file.  A Minimally Logged bulk insert writes directly to the data files, and only writes log records that record which extents have been allocated to the table and contain the new rows.

    One thing to note is that Minimal Logging is not always faster.  Many systems support fast sequential writes to the log file, but aren't built to support fast sequential writes to the data files.  In the full logging case the data files will be updated in the background by Lazy Writer/Checkpoint and so for small-to-medium loads on a general-purpose SQL box, minimal logging is often slower.

    For a Heap table without any indexes or indexed views in a database in SIMPLE recovery, TABLOCK should be sufficient to enable minimal logging.  See this series of blog posts for details

    https://blogs.msdn.microsoft.com/sqlserverstorageengine/2008/03/06/minimal-logging-changes-in-sql-server-2008/

    David


    David http://blogs.msdn.com/b/dbrowne/




    Wednesday, September 21, 2016 9:34 PM
  • Make sure the database are set in BULK_LOGGED Recovery Model.

    Or simple - which Mahesh says he has.

    Overall, there is nothing in Mahesh's post which explains why the INSERT would be fully logged or why the log file is exploding.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Wednesday, September 21, 2016 10:14 PM
  • Please post the results of SELECT @@VERSION.

    Microsoft SQL Server 2008 R2 (SP3) - 10.50.6529.0 (X64) 
    Mar 19 2015 13:19:13 
    Copyright (c) Microsoft Corporation
    Enterprise Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) (Hypervisor)


    Mahesh

    Thursday, September 22, 2016 12:05 AM
  • Make sure the database are set in BULK_LOGGED Recovery Model.

    Microsoft Certified Trainer & MVP on SQL Server
    Please "Propose as Answer" if you got an answer on your question, and vote for it as helpful to help other user's find a solution on a similar question quicker.

    DB is in Simple recovery model - that should work right?


    Mahesh

    Thursday, September 22, 2016 12:07 AM
  • I converted the code to do SELECT INTO, to my surprise even this did same kind of performance - lots of WRITELOG waits and LOGSPACE USED PCT ballooned. What could be an issue that is doing fully logged operation?

     SELECT [MBR_ID] ,[MBR_.... INTO mydb.dbo.Tb_Retails FROM otherDb.dbo.Tb_Retail_stg2

    By the by, the database backup job is running along around same time, would that cause the bulk load perform fully logged?

    Thanks



    Mahesh

    Thursday, September 22, 2016 12:28 AM
  • Although data insertions are not logged in the transaction log during a minimally logged bulk-import operation, the Database Engine still logs extent allocations each time a new extent is allocated to the table.

    Don't expect to see NO transaction log usage at all during an INSERT operation - that is one of the myths of simple/bulk-logged recovery mode


    Martin Cairney SQL Server MVP

    Thursday, September 22, 2016 2:32 AM
  • Martin,

    The LOGSPACE USED exploded up to 30GB for below size table - would that balloon so much?

    Below is the source table we are ETLing to target - all rows but not all columns. Target will be little lesser, we had to stop the job yesterday hence do not have target table stats.

    sp_spaceused 'Tb_Retail_xx

    name rows reserved data index_size unused
    SRC_Tb_Retail_xx 23947643    41768128 KB 39082472 KB 2684040 KB 1616 KB

    Thanks


    Mahesh

    Thursday, September 22, 2016 1:17 PM
  • I think I narrowed down a bit on the issue..., today during day time I test/ran both t-sql methods (using TABLOCK hint & SELECT INTO) - both worked very well, the load completed in 6 mins vs 3 hours and saw little usage of trn log.

    So only thing I see is during earlier nightly runs the DB backup/maintenance job was running. Now the Big question is why the heck the DB backup operation changed the performance from bulk load to fully logged operation???

    Thanks

     

    Mahesh

    Thursday, September 22, 2016 7:31 PM
  • >why the heck the DB backup operation changed the performance from bulk load to fully logged operation???

    Because SQL Server has online backups.  You don't have to offline or quiesce the database during the backup. 

    The way SQL Server makes this possible is that the backup file contains copies of the database pages taken at different points in time during the backup, and it also contains all the log records written while the backup was running.  So if a page was changed after it was written to the backup device, it can be updated on restore to the point-in-time at which the backup completed.

    Now if a minimally-logged operation were allowed to write database pages that the backup has already read without writing log records for each row, that data wouldn't be anywhere in the backup file.  So SQL disables minimal logging during a backup.

    David


    David http://blogs.msdn.com/b/dbrowne/

    • Marked as answer by Mahesh Dasari Tuesday, September 27, 2016 6:27 PM
    Thursday, September 22, 2016 9:21 PM
  • So only thing I see is during earlier nightly runs the DB backup/maintenance job was running. Now the Big question is why the heck the DB backup operation changed the performance from bulk load to fully logged operation???

    Green man in the machine altering the recovery model at night? Stupidies have been seen before in a data centre.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Thursday, September 22, 2016 10:16 PM

  • Green man in the machine altering the recovery model at night? Stupidies have been seen before in a data centre.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Nope, recovery model is not altered. When ETL jobs overlap DB backup job - we are noticing v poor performance of both.

    Mahesh

    Tuesday, September 27, 2016 6:32 PM
  • Do you have any indexes on the dbo.Tb_Retails table? Did you enable trace flag 610?

    Tibor Karaszi, SQL Server MVP (Web Blog)

    Tuesday, September 27, 2016 7:53 PM
    Moderator
  • "Make sure the database are set in BULK_LOGGED Recovery Model."

    OP already stated that the database is in simple recovery. This allow minimally logged operations in the same way as bulk_logged does.


    Tibor Karaszi, SQL Server MVP (Web Blog)

    Tuesday, September 27, 2016 7:55 PM
    Moderator