Answered by:
Forcing parallel plan for CPU limited queries

Question
-
I have a simple query:
INSERT tableB SELECT SomethingSlowFunction(value) FROM tableA
SomethingSlowFunction is a bunch of trigonometric function calls on many columns of the table, all pure T-SQL, no CLR involved.
The query is CPU limited (100% on one of the 8 CPUs).
If I change the query a little bit (i.e. add a SORT BY) that will cause parallel execution but the sort will take a lot of resources (tempdb munching).
Any tips?
Wednesday, September 1, 2010 11:08 PM
Answers
-
Try using the APPLY operator. The idea is to use something that increase the cost of the serial plan, and see if it get equal or above the cost threshold for parallelism, without affecting performance.
insert into dbo.T1 (c1)
select T3.val
from dbo.T2 CROSS APPLY (select SomethingSlowFunction(T2.value) as val) as T3;
AMB
- Proposed as answer by Kalman Toth Thursday, September 2, 2010 5:54 AM
- Marked as answer by Ai-hua Qiu Thursday, September 9, 2010 6:22 AM
Thursday, September 2, 2010 12:05 AM -
If you can break it up into 8 SSIS Execute SQL tasks, you can achieve parallel execution.
Related links:
http://blogs.msdn.com/b/sqlperf/archive/2007/05/11/implement-parallel-execution-in-ssis.aspx
http://stackoverflow.com/questions/2363222/ssis-parallel-execution-of-tasks-how-efficient-is-it
Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM- Proposed as answer by Naomi N Thursday, September 2, 2010 1:40 PM
- Marked as answer by Ai-hua Qiu Thursday, September 9, 2010 6:22 AM
Thursday, September 2, 2010 5:58 AM -
Laszlo,
I fixed up the sample and ran it. It was using all 4 CPU-s, 2 (0,1,2,3) more than the others, but not near 100%. What is @@version?
CREATE TABLE TheBigTable ( ID bigint NOT NULL IDENTITY (1, 1), Ra real, Dec real ) ALTER TABLE TheBigTable ADD CONSTRAINT [IX_ID] PRIMARY KEY ( [ID] ) GO -- select count(*) from TheBigTable -- select * from TheBigTable SET NOCOUNT ON DECLARE @count int = 100000000 WHILE (@count > 0) BEGIN INSERT TheBigTable ( Ra, Dec) SELECT convert(bigint,360) * CONVERT(bigint,CONVERT(binary(4),NEWID()) )/ CONVERT(float,9223372036854775808), -- 2^63 convert(bigint,180)* CONVERT(bigint,CONVERT(binary(4),NEWID()) )/ CONVERT(float,9223372036854775808) - 90 -- 2^63 SET @count = @count - 1 END GO CREATE TABLE TheZoneTable ( ZoneID bigint NOT NULL, Ra real, Dec real, Cx real, Cy real, Cz real, a bigint, l bigint, q bigint, logbf real, ID bigint ) INSERT TheZoneTable (ZoneID, ra, dec, cx, cy, cz, ID, a, l, q, logbf) SELECT TOP 10000000 CONVERT(INT,FLOOR(( [dec] + 90.0) / 0.001)) as [ZoneID], ra, dec, COS(RADIANS( [dec]))*COS(RADIANS([ra])) AS [Cx], COS(RADIANS( [dec]))*RADIANS(RADIANS([ra])) AS [Cy], RADIANS(RADIANS( [dec])) AS [Cz], 1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) AS [a], LOG( 1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) ) AS [l], 0 AS [q], (2 - 1) * LOG(2) AS [logBF], ID FROM TheBigTable GO
Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM- Marked as answer by Ai-hua Qiu Thursday, September 9, 2010 6:23 AM
Friday, September 3, 2010 8:52 PM
All replies
-
> that will cause parallel execution
It is unclear whether you want to force parallel execution to occur or not occur. AFAIK, you cannot force it to occur. However, you can prevent it with the OPTION (MAXDOP 1) at the end of your statement.
INSERT tableB SELECT SomethingSlowFunction(value) FROM tableA OPTION (MAXDOP 1)
You can also configure this server-wide: http://msdn.microsoft.com/en-us/library/ms181007.aspx
Wednesday, September 1, 2010 11:45 PM -
Try using the APPLY operator. The idea is to use something that increase the cost of the serial plan, and see if it get equal or above the cost threshold for parallelism, without affecting performance.
insert into dbo.T1 (c1)
select T3.val
from dbo.T2 CROSS APPLY (select SomethingSlowFunction(T2.value) as val) as T3;
AMB
- Proposed as answer by Kalman Toth Thursday, September 2, 2010 5:54 AM
- Marked as answer by Ai-hua Qiu Thursday, September 9, 2010 6:22 AM
Thursday, September 2, 2010 12:05 AM -
If you can break it up into 8 SSIS Execute SQL tasks, you can achieve parallel execution.
Related links:
http://blogs.msdn.com/b/sqlperf/archive/2007/05/11/implement-parallel-execution-in-ssis.aspx
http://stackoverflow.com/questions/2363222/ssis-parallel-execution-of-tasks-how-efficient-is-it
Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM- Proposed as answer by Naomi N Thursday, September 2, 2010 1:40 PM
- Marked as answer by Ai-hua Qiu Thursday, September 9, 2010 6:22 AM
Thursday, September 2, 2010 5:58 AM -
Try using the APPLY operator.
Thursday, September 2, 2010 3:53 PM -
Actually, running multiple queries concurrently is not gonna work for me because I have to use TABLOCKX to prevent logging.Thursday, September 2, 2010 3:54 PM
-
Have you considered using a TVF function instead of a scalar UDF? You should be able to optimize performance and reduce the CPU load by switching from scalar to TVF (inline if possible).
http://jahaines.blogspot.com/Thursday, September 2, 2010 4:06 PM -
Queries with TVFs behind cross apply aren't parallelized, at least that's my experience. ATM I use only pure SQL functions (sin & cos only, but a lot of them)Thursday, September 2, 2010 5:24 PM
-
Queries with TVFs behind cross apply aren't parallelized, at least that's my experience. ATM I use only pure SQL functions (sin & cos only, but a lot of them)
Right. I guess what i am getting at is.... Why would force the query to run in parallel and perhaps a bit faster and less intensive than make the code perform a lot better and a lot less intensive by using a TVF?
http://jahaines.blogspot.com/Thursday, September 2, 2010 7:53 PM -
Because it is CPU limited if runs single threaded. Parallelizing would use more than one processor cores and utilize more IO so the query would end many times faster. There's not much room for optimization in the CPU limited part.Thursday, September 2, 2010 9:19 PM
-
Parallelizing would use more than one processor cores and utilize more IO so the query would end many times faster.
Laszlo,
Disk I/O is the weakest link in this chain. 8 CPU-s waiting on the same Read/Write head may not be much faster than 1 CPU waiting for the same.
For quicker assistance, can you post your script? CREATE TABLE(s) and 10-20 INSERT INTO-s for population.
Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAMFriday, September 3, 2010 8:07 AM -
Believe me, it's CPU limited. But here's the script if you're interested:
CREATE TABLE user_10090217400198_1_Match_0 ( [MatchID] [bigint] NOT NULL IDENTITY (1, 1), [RA] [float], [Dec] [float], [Cx] [float], [Cy] [float], [Cz] [float], [a] [float], [l] [float], [q] [float], [logBF] [float], [ZoneID] [int], [_GALEX_dbo_PhotoObjAll_g_objid] bigint NOT NULL, [_GALEX_dbo_PhotoObjAll_g_ra] float NOT NULL, [_GALEX_dbo_PhotoObjAll_g_dec] float NOT NULL, [_GALEX_dbo_PhotoObjAll_g_cx] float NOT NULL, [_GALEX_dbo_PhotoObjAll_g_cy] float NOT NULL, [_GALEX_dbo_PhotoObjAll_g_cz] float NOT NULL ) ALTER TABLE user_10090217400198_1_Match_0 ADD CONSTRAINT [IXC_dbo_user_10090217400198_1_Match_0] PRIMARY KEY ( [MatchID] ) GO -- 9/2/2010 5:40:21 PM DECLARE @H Float = 0.00111111111111111 -- *** BayesFactorXMatchResources/PopulateInitialMatchTable.sql *** --- INSERT [dbo].[user_10090217400198_1_Match_0] WITH (TABLOCKX) ([RA], [Dec], [Cx], [Cy], [Cz], [a], [l], [q], [logBF], [ZoneID], [_GALEX_dbo_PhotoObjAll_g_objid], [_GALEX_dbo_PhotoObjAll_g_ra], [_GALEX_dbo_PhotoObjAll_g_dec], [_GALEX_dbo_PhotoObjAll_g_cx], [_GALEX_dbo_PhotoObjAll_g_cy], [_GALEX_dbo_PhotoObjAll_g_cz]) SELECT [g].[ra] AS [RA], [g].[dec] AS [Dec], COS(RADIANS( [g].[dec]))*COS(RADIANS([g].[ra])) AS [Cx], COS(RADIANS( [g].[dec]))*SIN(RADIANS([g].[ra])) AS [Cy], SIN(RADIANS( [g].[dec])) AS [Cz], 1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) AS [a], LOG( 1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) ) AS [l], 0 AS [q], (2 - 1) * LOG(2) AS [logBF], CONVERT(INT,FLOOR(( [g].[dec] + 90.0) / @H)) as [ZoneID], [g].[objid] AS [_GALEX_dbo_PhotoObjAll_g_objid], [g].[ra] AS [_GALEX_dbo_PhotoObjAll_g_ra], [g].[dec] AS [_GALEX_dbo_PhotoObjAll_g_dec], [g].[cx] AS [_GALEX_dbo_PhotoObjAll_g_cx], [g].[cy] AS [_GALEX_dbo_PhotoObjAll_g_cy], [g].[cz] AS [_GALEX_dbo_PhotoObjAll_g_cz] FROM [Skynode_Galex].[dbo].[PhotoObjAll] AS g WITH (NOLOCK) WHERE [g].[ra] BETWEEN 205 AND 210 AND [g].[dec] BETWEEN -5 AND 5 AND ([g].[ra] >= 207.7 AND [g].[ra] < 210.1) GO
And it cannot be precomputed. This is a step of a computation, can occur any time with different parameters.
Actually, pushing the calculations into a CLR function might help but not much, I would still bounce from the CPU limit. The raid system under this can do about 650MB/sec which I can easily reach with other queries. Here clearly the CPU limits the performance if it executes single threadedly. And there's eight of those so I wanna use them somehow.
Friday, September 3, 2010 2:43 PM -
Laszlo,
I got:
Msg 208, Level 16, State 1, Line 4
Invalid object name 'Skynode_Galex.dbo.PhotoObjAll'.
Can you provide TABLE CREATE and INSERT INTO-s for population? Thanks.
Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAMFriday, September 3, 2010 3:06 PM -
Thats gonna be pretty complicated. Just create a _huge_ table with the columns referenced: ra, dec, cx, cy, cz etc. all real and the id columns are bigint. Insert about 200M rows and see it running on only one thread. The trignometric functions will cause the processor core to saturate.Friday, September 3, 2010 4:10 PM
-
Can you give us something we can test with? Pretty exciting stuff...
BTW: I studied physics also at Eötvös University.
Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM
Friday, September 3, 2010 4:21 PM -
The thing is that the actual dataset is pretty huge but it's easy to generate some mock data.
CREATE TABLE TheBigTable
( ID bigint NOT NULL IDENTITY (1, 1),
Ra real, Dec real )ALTER TABLE TheBigTable ADD CONSTRAINT [IX_ID] PRIMARY KEY ( [ID] )
DECLARE @count int = 100000000
WHILE (@count > 0)
BEGIN
INSERT TheBigTable
( Ra, Dec)
VALUES
(360 * select CONVERT(binary(8),NEWID()) * CONVERT(bigint,1) / CONVERT(float,9223372036854775808) -- 2^63
180 * select CONVERT(binary(8),NEWID()) * CONVERT(bigint,1) / CONVERT(float,9223372036854775808) - 90 -- 2^63)
@count = @count - 1
END
And then the big insert
CREATE TABLE TheZoneTable
( ZoneID bigint NOT NULL
Ra real, Dec real, Cx real, Cy real, Cz real, a, l, q, logbf
ID bigint )INSERT TheZoneTable
(ZoneID, ra, dec, cx, cy, cz, ID, a, l, q, logbf)
SELECT
CONVERT(INT,FLOOR(( [dec] + 90.0) / @H)) as [ZoneID],
ra, dec
COS(RADIANS( [dec]))*COS(RADIANS([ra])) AS [Cx],
COS(RADIANS( [dec]))*SIN(RADIANS([ra])) AS [Cy],
SIN(RADIANS( [dec])) AS [Cz],
1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) AS [a],
LOG( 1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) ) AS [l],
0 AS [q],
(2 - 1) * LOG(2) AS [logBF],
ID
FROM TheBigTableHaven't tried running it but should work once typos corrected ;-) @H is the parameter, usually small (0.001 or less)
Friday, September 3, 2010 4:48 PM -
Laszlo,
I fixed up the sample and ran it. It was using all 4 CPU-s, 2 (0,1,2,3) more than the others, but not near 100%. What is @@version?
CREATE TABLE TheBigTable ( ID bigint NOT NULL IDENTITY (1, 1), Ra real, Dec real ) ALTER TABLE TheBigTable ADD CONSTRAINT [IX_ID] PRIMARY KEY ( [ID] ) GO -- select count(*) from TheBigTable -- select * from TheBigTable SET NOCOUNT ON DECLARE @count int = 100000000 WHILE (@count > 0) BEGIN INSERT TheBigTable ( Ra, Dec) SELECT convert(bigint,360) * CONVERT(bigint,CONVERT(binary(4),NEWID()) )/ CONVERT(float,9223372036854775808), -- 2^63 convert(bigint,180)* CONVERT(bigint,CONVERT(binary(4),NEWID()) )/ CONVERT(float,9223372036854775808) - 90 -- 2^63 SET @count = @count - 1 END GO CREATE TABLE TheZoneTable ( ZoneID bigint NOT NULL, Ra real, Dec real, Cx real, Cy real, Cz real, a bigint, l bigint, q bigint, logbf real, ID bigint ) INSERT TheZoneTable (ZoneID, ra, dec, cx, cy, cz, ID, a, l, q, logbf) SELECT TOP 10000000 CONVERT(INT,FLOOR(( [dec] + 90.0) / 0.001)) as [ZoneID], ra, dec, COS(RADIANS( [dec]))*COS(RADIANS([ra])) AS [Cx], COS(RADIANS( [dec]))*RADIANS(RADIANS([ra])) AS [Cy], RADIANS(RADIANS( [dec])) AS [Cz], 1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) AS [a], LOG( 1 / POWER( CONVERT(float, 0.5) / 3600 / 180*PI(), 2) ) AS [l], 0 AS [q], (2 - 1) * LOG(2) AS [logBF], ID FROM TheBigTable GO
Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM- Marked as answer by Ai-hua Qiu Thursday, September 9, 2010 6:23 AM
Friday, September 3, 2010 8:52 PM -
Laszlo,
I fixed up the sample and ran it. It was using all 4 CPU-s, 2 (0,1,2,3) more than the others, but not near 100%. What is @@version?
Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (X64) Jul 9 2008 14:17:44 Copyright (c) 1988-2008 Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 6.0 <X64> (Build 6001: Service Pack 1)Might not be the freshest, the sysadmins around are usually very busy.
What if you put TABLOCKX on the insert. It is important to avoid logging. Isn't it the TOP that causes it to run parallel? That would be strange but a good solution...
Friday, September 3, 2010 9:04 PM -
RTM means original. You need to upgrade: http://support.microsoft.com/kb/968382/en-us
Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAM- Proposed as answer by Naomi N Friday, September 3, 2010 9:58 PM
Friday, September 3, 2010 9:56 PM -
What if you put TABLOCKX on the insert. It is important to avoid logging. Isn't it the TOP that causes it to run parallel? That would be strange but a good solution...
Same. Keeps all CPU-s busy 10-20%. 4 minutes for 100 million rows insert.How's your Processor page setup in SQL Server Properties? You can use www.skydrive.com to upload the image, then copy and paste here.
Kalman Toth, SQL Server & Business Intelligence Training; SQL 2008 GRAND SLAMFriday, September 3, 2010 10:29 PM -
All default, all processors used, all auto affinity.
Yeah, I'm pretty sure these servers were installed well before the release of SQL Server 2008, maybe they haven't been even updated to the final release version, I'll have the admins upgrade them to R2 and let you know about the results.
Köszi, Kálmán!
Saturday, September 4, 2010 12:34 AM