locked
Forcing parallel plan for CPU limited queries RRS feed

  • Question

  • I have a simple query:

    INSERT tableB
    SELECT
      SomethingSlowFunction(value)
    FROM tableA
    

    SomethingSlowFunction is a bunch of trigonometric function calls on many columns of the table, all pure T-SQL, no CLR involved.

    The query is CPU limited (100% on one of the 8 CPUs).

    If I change the query a little bit (i.e. add a SORT BY) that will cause parallel execution but the sort will take a lot of resources (tempdb munching).

    Any tips?

    Wednesday, September 1, 2010 11:08 PM

Answers

  • Try using the APPLY operator. The idea is to use something that increase the cost of the serial plan, and see if it get equal or above the cost threshold for parallelism, without affecting performance.

    insert into dbo.T1 (c1)
    select T3.val
    from dbo.T2 CROSS APPLY (select SomethingSlowFunction(T2.value) as val) as T3;

     


    AMB

    Some guidelines for posting questions...

    • Proposed as answer by Kalman Toth Thursday, September 2, 2010 5:54 AM
    • Marked as answer by Ai-hua Qiu Thursday, September 9, 2010 6:22 AM
    Thursday, September 2, 2010 12:05 AM
  • If you can break it up into 8 SSIS Execute SQL tasks, you can achieve parallel execution.

    Related links:

    http://blogs.msdn.com/b/sqlperf/archive/2007/05/11/implement-parallel-execution-in-ssis.aspx

    http://stackoverflow.com/questions/2363222/ssis-parallel-execution-of-tasks-how-efficient-is-it


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    • Proposed as answer by Naomi N Thursday, September 2, 2010 1:40 PM
    • Marked as answer by Ai-hua Qiu Thursday, September 9, 2010 6:22 AM
    Thursday, September 2, 2010 5:58 AM
  • Laszlo,

    I fixed up the sample and ran it. It was using all 4 CPU-s,  2 (0,1,2,3) more than the others, but not near 100%.  What is @@version?

    CREATE TABLE TheBigTable 
    ( ID bigint NOT NULL IDENTITY (1, 1),
    Ra real, Dec real )
    
    ALTER TABLE TheBigTable ADD CONSTRAINT [IX_ID] PRIMARY KEY ( [ID] )
    GO
    -- select count(*) from TheBigTable
    -- select * from TheBigTable
    SET NOCOUNT ON
    DECLARE @count int = 100000000
    WHILE (@count > 0)
    BEGIN INSERT TheBigTable ( Ra, Dec)
    SELECT
     convert(bigint,360) * CONVERT(bigint,CONVERT(binary(4),NEWID()) )/ CONVERT(float,9223372036854775808), -- 2^63
     convert(bigint,180)* CONVERT(bigint,CONVERT(binary(4),NEWID()) )/ CONVERT(float,9223372036854775808) - 90 -- 2^63
     SET @count = @count - 1
    END
    GO
    
    CREATE TABLE TheZoneTable 
    ( ZoneID bigint NOT NULL,
    Ra real, Dec real, Cx real, Cy real, Cz real, a bigint, l bigint, q bigint, logbf real,
    ID bigint )
    
    INSERT TheZoneTable
    (ZoneID, ra, dec, cx, cy, cz, ID, a, l, q, logbf)
    SELECT TOP 10000000
     CONVERT(INT,FLOOR(( [dec] + 90.0) / 0.001)) as [ZoneID],
     ra, dec,
     COS(RADIANS( [dec]))*COS(RADIANS([ra])) AS [Cx],
     COS(RADIANS( [dec]))*RADIANS(RADIANS([ra])) AS [Cy],
     RADIANS(RADIANS( [dec])) AS [Cz],
     1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) AS [a],
     LOG( 1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) ) AS [l],
     0 AS [q],
     (2 - 1) * LOG(2) AS [logBF],
     ID
    FROM TheBigTable
    GO
    
    

    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    • Marked as answer by Ai-hua Qiu Thursday, September 9, 2010 6:23 AM
    Friday, September 3, 2010 8:52 PM

All replies

  • > that will cause parallel execution

    It is unclear whether you want to force parallel execution to occur or not occur.  AFAIK, you cannot force it to occur.  However, you can prevent it with the OPTION (MAXDOP 1) at the end of your statement.

    INSERT tableB
    SELECT
     SomethingSlowFunction(value)
    FROM tableA
    OPTION (MAXDOP 1)
    

    You can also configure this server-wide: http://msdn.microsoft.com/en-us/library/ms181007.aspx

     

    Wednesday, September 1, 2010 11:45 PM
  • Try using the APPLY operator. The idea is to use something that increase the cost of the serial plan, and see if it get equal or above the cost threshold for parallelism, without affecting performance.

    insert into dbo.T1 (c1)
    select T3.val
    from dbo.T2 CROSS APPLY (select SomethingSlowFunction(T2.value) as val) as T3;

     


    AMB

    Some guidelines for posting questions...

    • Proposed as answer by Kalman Toth Thursday, September 2, 2010 5:54 AM
    • Marked as answer by Ai-hua Qiu Thursday, September 9, 2010 6:22 AM
    Thursday, September 2, 2010 12:05 AM
  • If you can break it up into 8 SSIS Execute SQL tasks, you can achieve parallel execution.

    Related links:

    http://blogs.msdn.com/b/sqlperf/archive/2007/05/11/implement-parallel-execution-in-ssis.aspx

    http://stackoverflow.com/questions/2363222/ssis-parallel-execution-of-tasks-how-efficient-is-it


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    • Proposed as answer by Naomi N Thursday, September 2, 2010 1:40 PM
    • Marked as answer by Ai-hua Qiu Thursday, September 9, 2010 6:22 AM
    Thursday, September 2, 2010 5:58 AM
  • Try using the APPLY operator.

    That's a nice idea, however it runs on two threads only, if T3 has only one row. Still CPU limited.
    Thursday, September 2, 2010 3:53 PM
  • Actually, running multiple queries concurrently is not gonna work for me because I have to use TABLOCKX to prevent logging.
    Thursday, September 2, 2010 3:54 PM
  • Have you considered using a TVF function instead of a scalar UDF?  You should be able to optimize performance and reduce the CPU load by switching from scalar to TVF (inline if possible).
    http://jahaines.blogspot.com/
    Thursday, September 2, 2010 4:06 PM
  • Queries with TVFs behind cross apply aren't parallelized, at least that's my experience. ATM I use only pure SQL functions (sin & cos only, but a lot of them)
    Thursday, September 2, 2010 5:24 PM
  • Queries with TVFs behind cross apply aren't parallelized, at least that's my experience. ATM I use only pure SQL functions (sin & cos only, but a lot of them)

    Right.  I guess what i am getting at is.... Why would force the query to run in parallel and perhaps a bit faster and less intensive than make the code perform a lot better and a lot less intensive by using a TVF?
    http://jahaines.blogspot.com/
    Thursday, September 2, 2010 7:53 PM
  • Because it is CPU limited if runs single threaded. Parallelizing would use more than one processor cores and utilize more IO so the query would end many times faster. There's not much room for optimization in the CPU limited part.
    Thursday, September 2, 2010 9:19 PM
  •  Parallelizing would use more than one processor cores and utilize more IO so the query would end many times faster.

    Laszlo,

    Disk I/O is the weakest link in this chain. 8 CPU-s waiting on the same Read/Write head may not be much faster than 1 CPU waiting for the same.

    For quicker assistance, can you post your script? CREATE TABLE(s) and 10-20 INSERT INTO-s for population.


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Friday, September 3, 2010 8:07 AM
  • Believe me, it's CPU limited. But here's the script if you're interested:

    CREATE TABLE user_10090217400198_1_Match_0
    (
    	  [MatchID] [bigint] NOT NULL IDENTITY (1, 1),
    	  [RA] [float],
    	  [Dec] [float],
    	  [Cx] [float],
    	  [Cy] [float],
    	  [Cz] [float],
    	  [a] [float],
    	  [l] [float],
    	  [q] [float],
    	  [logBF] [float],
    	  [ZoneID] [int],
    	  [_GALEX_dbo_PhotoObjAll_g_objid] bigint NOT NULL, [_GALEX_dbo_PhotoObjAll_g_ra] float NOT NULL, [_GALEX_dbo_PhotoObjAll_g_dec] float NOT NULL, [_GALEX_dbo_PhotoObjAll_g_cx] float NOT NULL, [_GALEX_dbo_PhotoObjAll_g_cy] float NOT NULL, [_GALEX_dbo_PhotoObjAll_g_cz] float NOT NULL
    
    )
    
    ALTER TABLE user_10090217400198_1_Match_0 ADD CONSTRAINT [IXC_dbo_user_10090217400198_1_Match_0] PRIMARY KEY ( [MatchID] )
    GO
    -- 9/2/2010 5:40:21 PM
    DECLARE @H Float = 0.00111111111111111
    -- *** BayesFactorXMatchResources/PopulateInitialMatchTable.sql *** ---
    
    INSERT [dbo].[user_10090217400198_1_Match_0] WITH (TABLOCKX)
        ([RA], [Dec], [Cx], [Cy], [Cz], [a], [l], [q], [logBF], [ZoneID], [_GALEX_dbo_PhotoObjAll_g_objid], [_GALEX_dbo_PhotoObjAll_g_ra], [_GALEX_dbo_PhotoObjAll_g_dec], [_GALEX_dbo_PhotoObjAll_g_cx], [_GALEX_dbo_PhotoObjAll_g_cy], [_GALEX_dbo_PhotoObjAll_g_cz])
    SELECT [g].[ra] AS [RA],
        [g].[dec] AS [Dec],
        COS(RADIANS( [g].[dec]))*COS(RADIANS([g].[ra])) AS [Cx],
        COS(RADIANS( [g].[dec]))*SIN(RADIANS([g].[ra])) AS [Cy],
        SIN(RADIANS( [g].[dec])) AS [Cz],
        1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) AS [a],
        LOG( 1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) ) AS [l],
        0 AS [q],
        (2 - 1) * LOG(2) AS [logBF],
        CONVERT(INT,FLOOR(( [g].[dec] + 90.0) / @H)) as [ZoneID],
        [g].[objid] AS [_GALEX_dbo_PhotoObjAll_g_objid], [g].[ra] AS [_GALEX_dbo_PhotoObjAll_g_ra], [g].[dec] AS [_GALEX_dbo_PhotoObjAll_g_dec], [g].[cx] AS [_GALEX_dbo_PhotoObjAll_g_cx], [g].[cy] AS [_GALEX_dbo_PhotoObjAll_g_cy], [g].[cz] AS [_GALEX_dbo_PhotoObjAll_g_cz]
    FROM [Skynode_Galex].[dbo].[PhotoObjAll] AS g WITH (NOLOCK)
    WHERE [g].[ra] BETWEEN 205 AND 210 AND [g].[dec] BETWEEN -5 AND 5 AND ([g].[ra] >= 207.7 AND [g].[ra] < 210.1) 
    GO

    And it cannot be precomputed. This is a step of a computation, can occur any time with different parameters.

    Actually, pushing the calculations into a CLR function might help but not much, I would still bounce from the CPU limit. The raid system under this can do about 650MB/sec which I can easily reach with other queries. Here clearly the CPU limits the performance if it executes single threadedly.  And there's eight of those so I wanna use them somehow.

    Friday, September 3, 2010 2:43 PM
  • Laszlo,

    I got:

    Msg 208, Level 16, State 1, Line 4

    Invalid object name 'Skynode_Galex.dbo.PhotoObjAll'.

    Can you provide TABLE CREATE and INSERT INTO-s for population? Thanks.


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Friday, September 3, 2010 3:06 PM
  • Thats gonna be pretty complicated. Just create a _huge_ table with the columns referenced: ra, dec, cx, cy, cz etc. all real and the id columns are bigint. Insert about 200M rows and see it running on only one thread. The trignometric functions will cause the processor core to saturate.
    Friday, September 3, 2010 4:10 PM
  • Can you give us something we can test with? Pretty exciting stuff...

     BTW: I studied physics also at Eötvös University.

     


     

    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM

    Friday, September 3, 2010 4:21 PM
  • The thing is that the actual dataset is pretty huge but it's easy to generate some mock data.

    CREATE TABLE TheBigTable
    ( ID bigint NOT NULL IDENTITY (1, 1),
    Ra real, Dec real )

    ALTER TABLE TheBigTable ADD CONSTRAINT [IX_ID] PRIMARY KEY ( [ID] )

    DECLARE @count int = 100000000

    WHILE (@count > 0)

    BEGIN

    INSERT TheBigTable
    ( Ra, Dec)
    VALUES
    (

    360 * select CONVERT(binary(8),NEWID()) * CONVERT(bigint,1) / CONVERT(float,9223372036854775808) -- 2^63
    180 * select CONVERT(binary(8),NEWID()) * CONVERT(bigint,1) / CONVERT(float,9223372036854775808) - 90 -- 2^63

    )

    @count = @count - 1

    END

    And then the big insert

    CREATE TABLE TheZoneTable
    ( ZoneID bigint NOT NULL
    Ra real, Dec real, Cx real, Cy real, Cz real, a, l, q, logbf
    ID bigint )

    INSERT TheZoneTable
    (ZoneID, ra, dec, cx, cy, cz, ID, a, l, q, logbf)
    SELECT
        CONVERT(INT,FLOOR(( [dec] + 90.0) / @H)) as [ZoneID],
        ra, dec
        COS(RADIANS( [dec]))*COS(RADIANS([ra])) AS [Cx],
        COS(RADIANS( [dec]))*SIN(RADIANS([ra])) AS [Cy],
        SIN(RADIANS( [dec])) AS [Cz],
        1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) AS [a],
        LOG( 1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) ) AS [l],
        0 AS [q],
        (2 - 1) * LOG(2) AS [logBF],
        ID
    FROM TheBigTable

    Haven't tried running it but should work once typos corrected ;-) @H is the parameter, usually small (0.001 or less)

    Friday, September 3, 2010 4:48 PM
  • Laszlo,

    I fixed up the sample and ran it. It was using all 4 CPU-s,  2 (0,1,2,3) more than the others, but not near 100%.  What is @@version?

    CREATE TABLE TheBigTable 
    ( ID bigint NOT NULL IDENTITY (1, 1),
    Ra real, Dec real )
    
    ALTER TABLE TheBigTable ADD CONSTRAINT [IX_ID] PRIMARY KEY ( [ID] )
    GO
    -- select count(*) from TheBigTable
    -- select * from TheBigTable
    SET NOCOUNT ON
    DECLARE @count int = 100000000
    WHILE (@count > 0)
    BEGIN INSERT TheBigTable ( Ra, Dec)
    SELECT
     convert(bigint,360) * CONVERT(bigint,CONVERT(binary(4),NEWID()) )/ CONVERT(float,9223372036854775808), -- 2^63
     convert(bigint,180)* CONVERT(bigint,CONVERT(binary(4),NEWID()) )/ CONVERT(float,9223372036854775808) - 90 -- 2^63
     SET @count = @count - 1
    END
    GO
    
    CREATE TABLE TheZoneTable 
    ( ZoneID bigint NOT NULL,
    Ra real, Dec real, Cx real, Cy real, Cz real, a bigint, l bigint, q bigint, logbf real,
    ID bigint )
    
    INSERT TheZoneTable
    (ZoneID, ra, dec, cx, cy, cz, ID, a, l, q, logbf)
    SELECT TOP 10000000
     CONVERT(INT,FLOOR(( [dec] + 90.0) / 0.001)) as [ZoneID],
     ra, dec,
     COS(RADIANS( [dec]))*COS(RADIANS([ra])) AS [Cx],
     COS(RADIANS( [dec]))*RADIANS(RADIANS([ra])) AS [Cy],
     RADIANS(RADIANS( [dec])) AS [Cz],
     1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) AS [a],
     LOG( 1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) ) AS [l],
     0 AS [q],
     (2 - 1) * LOG(2) AS [logBF],
     ID
    FROM TheBigTable
    GO
    
    

    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    • Marked as answer by Ai-hua Qiu Thursday, September 9, 2010 6:23 AM
    Friday, September 3, 2010 8:52 PM
  • Laszlo,

    I fixed up the sample and ran it. It was using all 4 CPU-s,  2 (0,1,2,3) more than the others, but not near 100%.  What is @@version?


    Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (X64)   Jul  9 2008 14:17:44   Copyright (c) 1988-2008 Microsoft Corporation  Enterprise Edition (64-bit) on Windows NT 6.0 <X64> (Build 6001: Service Pack 1)

    Might not be the freshest, the sysadmins around are usually very busy.

    What if you put TABLOCKX on the insert. It is important to avoid logging. Isn't it the TOP that causes it to run parallel? That would be strange but a good solution...

    Friday, September 3, 2010 9:04 PM
  • RTM means original. You need to upgrade: http://support.microsoft.com/kb/968382/en-us

     


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    • Proposed as answer by Naomi N Friday, September 3, 2010 9:58 PM
    Friday, September 3, 2010 9:56 PM
  • What if you put TABLOCKX on the insert. It is important to avoid logging. Isn't it the TOP that causes it to run parallel? That would be strange but a good solution...


    Same. Keeps all CPU-s busy 10-20%.  4 minutes for 100 million rows insert.

    How's your Processor page setup in SQL Server Properties?  You can use www.skydrive.com to upload the image, then copy and paste here.


    Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
    Friday, September 3, 2010 10:29 PM
  • All default, all processors used, all auto affinity.

    Yeah, I'm pretty sure these servers were installed well before the release of SQL Server 2008, maybe they haven't been even updated to the final release version, I'll have the admins upgrade them to R2 and let you know about the results.

    Köszi, Kálmán!

    Saturday, September 4, 2010 12:34 AM