locked
massive Inserts from CTE are very slow RRS feed

  • Question

  • Hello.
    I'm trying to load a test case on a sql azure db.
    The scenario I want to test contains lots of data, up to 100,000,000 rows in a single table.
    On my quadcore i7 - 8GB laptop, I am able to load data with the following timings:
    - 13 secs for 500,000 rows
    - 4 min 11 secs for 10,000,000 rows
    - 50 min 46 secs for 100,000,000 rows
    When trying to run the same scripts (see below for a sample one), I get the following timings:
    - 1 min 15 secs for 500,000 rows
    - 22 min 28 secs for 10,000,000 rows
    - 3 hours 53 min 57 secs for 100,000,000 rows
    I repeated 3 times the 3rd one before being able to complete it. I was getting Error code 40174 (see http://social.msdn.microsoft.com/Forums/en-US/ssdsgetstarted/thread/f85e062e-557d-44cc-895d-008710834af1 )
    I think these timings are unacceptably high. Where does this slowness come from ? Am I doing something wrong ?
    My guesses: 
    1) probably i'm getting throttled because of the high cpu demand of my script that is actually "producing" data to be inserted. Thus, the shared nature of the service slows me down and sometimes slams me out with 40174.
    2) given the huge amount of inserts i'm performing, lots db file "growths" are being performed and i don't know which size they are. I'm not sure this can explain such a slowness also because on my laptop I've been able to pre-allocate the expected amount of space needed (about 8 GB). I can't try to do the same on sql azure because of the limitations on the "alter database modify file" options...
    Please find below one of the scripts I've run. It's a CTE with some cross joins that produces lots of rows "on the fly". This one loads 500,000 rows. A similar one repeats the same process 200 times to produce 100,000,000 rows. 
    I already used this method to build huge test cases with acceptable timings that allowed me to test the better index combinations and different app implementation approaches.
    I plan to run some perf test on Azure leveraging many of the available features in different combinations. The next one is the table storage, but first i would really like to complete the sql azure based ones loading the test case and deploying/running the web app.
    Thank you,
    Mario
    declare @numUsers int = 500000;
    WITH 
    digits AS (
        SELECT 0 as Number
        UNION SELECT 1
        UNION SELECT 2
        UNION SELECT 3
        UNION SELECT 4
        UNION SELECT 5
        UNION SELECT 6
        UNION SELECT 7
        UNION SELECT 8
        UNION SELECT 9
    )
    , dig AS (
        SELECT 
              (tenofmillions.Number * 10000000)
            + (millions.Number * 1000000)
            + (hThousands.Number * 100000) 
            + (tThousands.Number * 10000) 
            + (thousands.Number * 1000) 
            + (hundreds.Number * 100) 
            + (tens.Number * 10) 
            + ones.Number AS Number
        FROM digits AS ones 
        CROSS JOIN digits AS tens
        CROSS JOIN digits AS hundreds
        CROSS JOIN digits AS thousands
        CROSS JOIN digits AS tThousands
        CROSS JOIN digits AS hThousands
        CROSS JOIN digits AS millions
        CROSS JOIN digits AS tenofmillions
    )
    , prod AS (
    SELECT number, 'FullName'+ convert(varchar(12), 1+abs(CHECKSUM(NEWID()) % @numUsers )) AS username
    FROM dig 
    WHERE number < @numUsers )
    insert SocialUsers(idUser, UserName)
        SELECT number, username
        FROM prod
    go


    Friday, August 5, 2011 1:17 PM

Answers

  • Thank you Arunraj for pointing me to that post.

     

    Considering that list of Causes of Connection Termination, I decided to modify the CTE based algorithm to achieve higher speed and script completion.

     

    About the higher execution speed: after a quick search, I found this really interesting post by Jeff Moden ( http://www.sqlservercentral.com/articles/T-SQL/74118/ ), about performances of different counting implementations based on CTEs.

     

    The best performing one, with zero reads and the lowest CPU usage is the CROSS JOINED Cascaded CTE that uses less inner steps than the method I was previously using.

     

    The script of my initial post became:

     

    declare @numUsers int = 500000;

     

    WITH

          E1(N) AS (

                    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

                    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

                    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

                   ),                          -- 1*10^1 or 10 rows

          E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows

          E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*10^4 or 10,000 rows

          E8(N) AS (SELECT 1 FROM E4 a, E4 b), -- 1*10^8 or 100,000,000 rows

    cteTally(N) AS (SELECT TOP (@numUsers) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8)

    insert SocialUsers(idUser, UserName)

         SELECT N, 'FullName'+ convert(varchar(12), N ) AS username

           FROM cteTally

     

    The old version run in 1':15", while this one runs in just 15". A huge improvement.

     

    My worst case was the 100,000,000 rows version that took 3h :53':57" to complete in the old version (I was able to complete it just 1 time on 6 trials).

    The new version for 100,000,000 rows now took 1h:19':44", again a big improvement but, most important, it was able to complete the execution without the connection dropping.

     

    About the connection drops: they were linked with the log file usage exceeding the 1 GB limit.

    I was already loading the rows in 200 "batches" of 500,000 rows, but this was apparently not enough to avoid log related issues.

    I tried with 400 batches of 250,000 rows but was still getting a 40552 error msg.

     

    Being the sql azure recovery model = full and having a limited set of ALTER DATABASE options, I opted for the faster solution: drop and recreate the db. Starting with an empty log and running my 100M rows in 400 batches gave the correct result.

     

    Overall it has been an interesting lesson. I will now move on the next step of my perf tests.

     

    Cheers,

    Mario

    Monday, August 8, 2011 4:16 PM

All replies

  • Hi,

    Can you please clarify whether INSERT query is SLOW or the SELECT query to display the data after its generated is slow ?

    If the SELECT is slow, the query might be executing fast, but returning data over the Internet, there might be lag

    I would also suggest to enable the Display Actual Query plan from Management Studio and check the query plan and find out the details.


    Arunraj Chandrasekaran, MCTS, Author: SQLXpertise.com
    If you found this post useful, Please "Mark as Answer" or "Vote as Helpful"
    Friday, August 5, 2011 2:10 PM
  • The whole script is slow. It's an insert that takes data produced on the fly by the inner portion of the script itself (the CTE from which data is then selected).

    The whole thing is executed on the cloud by the involved sql azure core.

    There's no data transfer through the internet.

    Friday, August 5, 2011 2:27 PM
  • Try running the queries in parts and check which the Execution Plan and identify the issue

    FYI, SQL Azure doesn't allow long running queries and Transactions as well, please read this article for more information

    http://social.technet.microsoft.com/wiki/contents/articles/sql-azure-connection-management.aspx

     


    Arunraj Chandrasekaran, MCTS, Author: SQLXpertise.com
    If you found this post useful, Please "Mark as Answer" or "Vote as Helpful"
    • Edited by Arunraj.C Friday, August 5, 2011 11:43 PM Addl info
    Friday, August 5, 2011 11:40 PM
  • Thank you Arunraj for pointing me to that post.

     

    Considering that list of Causes of Connection Termination, I decided to modify the CTE based algorithm to achieve higher speed and script completion.

     

    About the higher execution speed: after a quick search, I found this really interesting post by Jeff Moden ( http://www.sqlservercentral.com/articles/T-SQL/74118/ ), about performances of different counting implementations based on CTEs.

     

    The best performing one, with zero reads and the lowest CPU usage is the CROSS JOINED Cascaded CTE that uses less inner steps than the method I was previously using.

     

    The script of my initial post became:

     

    declare @numUsers int = 500000;

     

    WITH

          E1(N) AS (

                    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

                    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

                    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

                   ),                          -- 1*10^1 or 10 rows

          E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows

          E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*10^4 or 10,000 rows

          E8(N) AS (SELECT 1 FROM E4 a, E4 b), -- 1*10^8 or 100,000,000 rows

    cteTally(N) AS (SELECT TOP (@numUsers) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8)

    insert SocialUsers(idUser, UserName)

         SELECT N, 'FullName'+ convert(varchar(12), N ) AS username

           FROM cteTally

     

    The old version run in 1':15", while this one runs in just 15". A huge improvement.

     

    My worst case was the 100,000,000 rows version that took 3h :53':57" to complete in the old version (I was able to complete it just 1 time on 6 trials).

    The new version for 100,000,000 rows now took 1h:19':44", again a big improvement but, most important, it was able to complete the execution without the connection dropping.

     

    About the connection drops: they were linked with the log file usage exceeding the 1 GB limit.

    I was already loading the rows in 200 "batches" of 500,000 rows, but this was apparently not enough to avoid log related issues.

    I tried with 400 batches of 250,000 rows but was still getting a 40552 error msg.

     

    Being the sql azure recovery model = full and having a limited set of ALTER DATABASE options, I opted for the faster solution: drop and recreate the db. Starting with an empty log and running my 100M rows in 400 batches gave the correct result.

     

    Overall it has been an interesting lesson. I will now move on the next step of my perf tests.

     

    Cheers,

    Mario

    Monday, August 8, 2011 4:16 PM