locked
Insert in parallel. Is it possible? RRS feed

  • Question

  • Hello!

    Is it possible to have parallel plan for INSERT statement? In the following statement I have SELECT part running in parallel and INSERT part running in single thread.

     

    insert txt
    select * from txt
    
    

     


    I have high-performance disk array that is loaded up to 10% with statement above.

    I'd like to use all 8 cores at INSERT phase to produce as much data as it possible to write on disks.

    Is it possible?


    Sunday, January 8, 2012 5:04 PM

Answers

  • Hi Anton,

    When you modify data the following is happening

    1. Sql Server synchronously writes the log record to the log file

    2. Data page is modified in memory and would be saved to the disk later during checkpoint asynchronously.

    You don't have much control over when data would be saved to disk in your scenario. If you want to speed up the insert process make sure that the disk array with the log file is optimized for sequential access.

    As the side note - if your server is hardware NUMA, make sure that MAXDOP does not exceed # of cores per NUMA node. Otherwise you can have major performance issues with parallel plans.


    Thank you!

    My blog: http://aboutsqlserver.com

    • Proposed as answer by Naomi N Monday, January 9, 2012 1:21 AM
    • Marked as answer by Anton Burtsev Monday, January 9, 2012 3:49 AM
    Sunday, January 8, 2012 11:37 PM

All replies

  • Have you checked the MAXDOP option? 

     

    You can use it as system settings 

    http://msdn.microsoft.com/en-us/library/ms181007.aspx

    Or as Query hint 

    http://msdn.microsoft.com/en-us/library/ms178065.aspx

    HTH,

    Noam 

    Sunday, January 8, 2012 8:07 PM
  • Hi,

    MAXDOP should be the way to go. Here you can find a little introduction.


    Best Regards. Please Mark Answered, If my solution solves your problem.
    Sunday, January 8, 2012 8:16 PM
  • Hi Anton,

    When you modify data the following is happening

    1. Sql Server synchronously writes the log record to the log file

    2. Data page is modified in memory and would be saved to the disk later during checkpoint asynchronously.

    You don't have much control over when data would be saved to disk in your scenario. If you want to speed up the insert process make sure that the disk array with the log file is optimized for sequential access.

    As the side note - if your server is hardware NUMA, make sure that MAXDOP does not exceed # of cores per NUMA node. Otherwise you can have major performance issues with parallel plans.


    Thank you!

    My blog: http://aboutsqlserver.com

    • Proposed as answer by Naomi N Monday, January 9, 2012 1:21 AM
    • Marked as answer by Anton Burtsev Monday, January 9, 2012 3:49 AM
    Sunday, January 8, 2012 11:37 PM
  • Thanks Anton, 

     

    Just want to post the same thread for paralleling the Insert operation. 

    As Dmitri mentioned, the write ahead log will sequencing all the insert/update(changes) operation. so the storage device should be optimized to sequential write. It can be understood. 

    Can the log be separated into different files? or there can be only 1 universal write ahead log among the sql server or database or schema or table? 

    Can I have a log per each table? 

    Or I can boost the append to write ahead log with Partitioned Table? and get benefit of parallelism? 

     

    Thanks 

     

    Tony :)

    Monday, January 9, 2012 3:32 AM
  • Q: Can the log be separated into different files?

    A: Yes, but only 1 file wil be used, SQL Server does not use multiple log files at the same time. All that adding another log file will do is consume more space and make maintenance slightly harder.

    Q: Can I have a log per each table?

    A: If you put each table in a different database, sure :) Otherwise, no. Log files are specific to databases, not any other objects inside of them.

    Q: Can I boost the append to write ahead log with Partitioned Table and get benefit of parallelism?

    A: Partitioned tables could make inserts faster if they had multiple partitions that each had their own set of disks with aligned indexes, but that's just a guess, I've never tried it on the same set of disks (I would think it would make the read/write heads jump too much on the same set of disks).

     

    Dmitri,

    I thought the IO to the log file was Asynchronous but that it did wait until all of the data was comitted to the log before returning the success or failure of the operation?

    -Sean

    Monday, January 9, 2012 4:05 AM
    Answerer
  • There is some rules for minimum logging. If you use partitioning (off course all files must be distributed different disks)and you are able to apply minimum logging rules , you can write as parallel on a table. Maxdop hint is not applicable for insert operations And I suggest this white paper . It's the best about data loading http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx

     

    Regards

    Kadir Evciler

    DBA at Eurobank Tekfen

    Monday, January 9, 2012 10:19 AM
  • Dmitri,

    I thought the IO to the log file was Asynchronous but that it did wait until all of the data was comitted to the log before returning the success or failure of the operation?

    -Sean

    Sean, we need to define the terminology :)

    It's synchronous operation - when you modify the page SQL Server needs to be sure that action has been logged and saved to the disk before it continues to operate. It's not happening on transaction commit - SQL Server needs to use log records in case if transaction is rolled back.  Also this is required for the crash recovery - SQL Server needs to be able to undo or redo the action in case of failure.


    Thank you!

    My blog: http://aboutsqlserver.com

    Monday, January 9, 2012 4:57 PM
  • I agree with everything you said, I meant that when the operation to WRITE to the log is given to the OS, it's an asynchronous IO operation. That way items can still be processed without blocking the thread on the write, but the entire logging must be completely written (as in the process of making a change is synchronous as you have already said) before the results (good, failed, etc) are given back.

    I understand the need and use of the log, I was trying to understand how SQL Server internally hands the IO operations to the OS.

    -Sean

    Monday, January 9, 2012 5:19 PM
    Answerer