none
Bulk Insert within a stored procedure

    Question

  • Hello ...

    I'm using sql server developer 2008 sp1.

    I have a data file that is approximately 10M rows, and about 900MB in size.  This imports in about a minute using 'Bulk insert <table> from <file> with (tablock)'.  In the grand scheme of things, that isn't much time, but I have also have to create a clustered index and a 3-4 non-clustered indexes and swap the data into a partitioned table.  I also need to do this for 3 years of weekly data (10M rows each)

    I'm running a core2 duo as my development server, and thought that splitting the file in 2 might get me a performance boost, but I'm not noticing any differences.

    The data files and the database are on the same machine, on the same physical disk ... I can't change that at the current time.

    I split the file in 2 and created 2 'bulk insert' statements and ran them ... do I need to do something special to get them to execute them in parallel, or does SQL Server handle this on its own?

    Thanks in advance for the help.

    Walter
    Wednesday, January 27, 2010 4:55 PM

Answers

All replies

  • Walter,

    The weakest link in the process is the disk read/write head movements. Even if you throw in 10 more processors to run parallel, mostly they would be idle waiting for the disk.

    It appears that the HW platform is undersized for the intended load you have.

    Consider when appropriate the following disk configuration:

    1. OS & SQL Server software (C:)
    2. Transaction logs (D:)
    3. tempdb data (E:)
    4. Application DB data (F:)
    5. Staging, data files (G:)

    4 can be RAID 5, the rest RAID 1 or equivalent.

    Also consider using SSIS which has parallel capabilty for IO processing (it would not help with your current platform).

    Kalman Toth SQL SERVER 2012 & BI TRAINING
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012






    Wednesday, January 27, 2010 5:13 PM
    Moderator
  • Kalman ...

    Thanks for all the information.  I will definitely pass on your suggestions to management for the QA and Production environments.

    Regarding the stored procedure itself, do I need to do anything differently than just executing 2 'bulk inserts' sequentially in the SP?

    Walter
    Wednesday, January 27, 2010 5:38 PM
  • Walter,

    I don't think parallel execution would help in your case.

    What is the reason you are using stored procedure for bulk load?

    You can just set up the load as a SQL Server Agent job with multiple job steps.  You have some workflow control capacity, also you can send warning/failure/success/completion emails.

    SQL Server Agent job also has history saved by job steps.

    A bigger step, the best solution: SSIS (irregard to the speed issue).


    Kalman Toth SQL SERVER 2012 & BI TRAINING
    New Book: Beginner Database Design & SQL Programming Using Microsoft SQL Server 2012






    Wednesday, January 27, 2010 11:38 PM
    Moderator
  • Walter, like SQLUSA said, looks like you already have 'good enough' speed given the hardware specs.

    Here is a link which should answer your question about parallel insert:
    http://msdn.microsoft.com/en-us/library/ms186341.aspx

    At the same time check these links too:
    http://blogs.msdn.com/sqlcat/archive/2006/05/19/602142.aspx
    http://sqlcat.com/top10lists/archive/2008/02/06/top-10-best-practices-for-building-a-large-scale-relational-data-warehouse.aspx

    In addition to that,
    - you can also 'play' with BATCHSIZE parameter.
    - another way to improve the speed is using the ORDER hint IF the raw data is ALREADY ordered and it matches your clustered index. That way you can save the time required for creating the clustered index or slowness caused by clustered index if not dropped first. Check if it's an option for you.


    Thanks,
    Varsham Papikian, New England SQL Server User Group Executive Board, USA
    New England SQL Server User Group; My LinkedIn page
    Please remember to click the "Mark as Answer" button if a post helps you!
    Thursday, January 28, 2010 12:45 AM
  • Kalman ...

    I must admit that I just didn't consider using SSIS (oops).  It took about 5 minutes to put these bulk insert statements into a package, and now I think I'm set for when I'm not I/O bound (QA and Prod).

    The reason I chose an SP is my level of familiarity.  This is the first step in a multi-step process, much of which needs to be executed dynamically (I'm going to be switching old data out of partitions).  Now that you mention it, I should be able to re-write the SP in one to many SSIS pacakages that use package variables.

    Now that I have it written in an SSIS package, would there be any advantage to exploring using SQL Agent (I have no experience using this).

    Thanks again for the advice.

    Walter
    Thursday, January 28, 2010 5:57 PM
  • Varsham ...

    I'm heap loading into 2 empty staging tables which will be UNIONed and switched into a partitioned table.  Thanks for the link about the 1 TB import ... it was an interesting read.
    Thursday, January 28, 2010 6:11 PM
  • Thanks for the suggestion to schedule with Agent, but I'm required to use a standard corporate scheduler (which will call DTEXEC)

    WP
    Thursday, January 28, 2010 8:31 PM