SQL Server Developer Center > SQL Server Forums > SQL Server XML > SQL Server 2008 xquery parallelism
Ask a questionAsk a question
 

AnswerSQL Server 2008 xquery parallelism

  • Wednesday, October 21, 2009 3:23 AMMichael Y. Users MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     
    I have a simple xquery statement that is puzzling me as to why SQL server insists on running it on only 1 CPU (while running on a machine that has 8 CPU's + 8 GB RAM).  The query looks something like this:

    SELECT ID, Data, Data.value('(//Project/@Value)[1]', 'float') AS Value
    FROM ObjectData
    WHERE (Data.exist('(//Project[@Value gt 456])')= 1)
    AND (Data.exist('(//Project[@Value lt 556])')= 1)
    ORDER BY Value DESC

    The 'ObjectData' table has ~ 250,000 rows of XML; the query completes in about 10 sec's, which isn't bad, but it's hardly using the hardware it has at hand.

    I'm ok with the idea that I'm missing something larger here as to why a read operation (which really should have no side effects) can't be broken into a parallel plan.  In this day and age it is something we're told to do as often as possible, and is something I've grown quite used to doing in C/C# land.  Yet here is SQL server 2008, refusing to take advantage of all the available hardware....at least it seems that way.

    Help?!?!  TIA...

Answers

  • Wednesday, October 21, 2009 9:57 AMwBobAnswererUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     AnswerHas Code
    Good question.  For me, the root of the question is "When will SQL Server use parallelism?" and the answer as so often with SQL Server is "it depends"!

    First of all, I would check basics, like the server settings:

    EXEC sp_configure 'max degree of parallelism'
    
    Ensure the run value is not set to something other than 0.  Also make sure query doesn't have a MAXDOP hint.  Otherwise, the cost based optimizer will parallelize when it thinks it's worth it.  Quote from "Inside Microsoft SQL Server 2005: Query Tuning and Optimization", p60

    "The query optimizer decides whether to execute a query in parallel. For the optimizer even to

    consider a parallel plan, the following criteria must be met:

    · SQL Server must be running on a multiprocessor, multicore, or hyperthreaded machine.

    · The affinity mask (and affinity mask64 for 64 processor servers) advanced configuration

    option must allow SQL Server to use at least two processors. The default setting of zero for

    affinity mask allows SQL Server to use all available processors.

    · The max degree of parallelism advanced configuration option must be set to zero (the

    default) or to more than one. We discuss this option in more detail below.

    Like most other decisions, the choice of whether to choose a serial or a parallel plan is costbased. A

    complex and expensive query that processes many rows is more likely to result in a parallel plan

    than a simple query that processes very few rows. Although it is rarely necessary, you can also

    adjust the cost threshold for parallelism advanced configuration setting to raise or lower the

    threshold above which the optimizer considers parallel plans."

    I realise this quote is from a SQL 2005 book, but I think it still applies.

    Interestingly, in my repro script below, the query does parallelize in both SQL 2005 and SQL 2008 on my dual core laptop:

    USE tempdb
    GO
    
    SET NOCOUNT ON
    GO
    
    DROP TABLE ObjectData 
    GO
    CREATE TABLE ObjectData ( ID INT IDENTITY PRIMARY KEY, Data XML )
    GO
    
    INSERT INTO ObjectData ( Data )
    SELECT '<Projects>
    	<Project Value="' + LTRIM( STR( RAND() * 1000 ) ) + '"/>
    	<Project Value="' + LTRIM( STR( RAND() * 1000 ) ) + '"/>
    	</Projects>'
    GO 250000
    
    -- Make sure at least one record meets the criteria
    --UPDATE ObjectData
    --SET Data.modify('replace value of (//Project/@value)[1] with 457')
    --WHERE ID = 150000
    
    --UPDATE ObjectData
    --SET Data.modify('replace value of (//Project/@value)[2] with 555')
    --WHERE ID = 150000
    
    SELECT ID, Data, Data.value('(//Project/@Value)[1]', 'float') AS Value
    FROM ObjectData
    WHERE (Data.exist('(//Project[@Value gt 456])')= 1)
      AND (Data.exist('(//Project[@Value lt 556])')= 1)
    ORDER BY Value DESC
    GO
    
    SELECT ID, Data, Data.value('(//Project/@Value)[1]', 'float') AS Value
    FROM ObjectData
    WHERE (Data.exist('(//Project[@Value gt 456])')= 1)
      AND (Data.exist('(//Project[@Value lt 556])')= 1)
    ORDER BY Value DESC
    OPTION ( MAXDOP 1 )
    

    Try it and compare the execution plans on the last two queries.  On my laptop, they both come back with 50% costs, basically meaning they're about the same in cost.  Other factors on your machine may influence this, eg the number of other queries running, indexing etc.  Compare the two plans on your machine and let us know how you get on.

    Finally, I wouldl worry less about parallelism and more about performance tuning your XML and XQuery.  Start here with these two great articles:

    Performance Optimizations for the XML Data Type in SQL Server 2005
    http://msdn.microsoft.com/en-us/library/ms345118.aspx

    XML Indexes in SQL Server 2005
    http://msdn.microsoft.com/en-us/library/ms345121(SQL.90).aspx

    I have seen performance improvements from making the XML typed.

All Replies

  • Wednesday, October 21, 2009 9:57 AMwBobAnswererUsers MedalsUsers MedalsUsers MedalsUsers MedalsUsers Medals
     AnswerHas Code
    Good question.  For me, the root of the question is "When will SQL Server use parallelism?" and the answer as so often with SQL Server is "it depends"!

    First of all, I would check basics, like the server settings:

    EXEC sp_configure 'max degree of parallelism'
    
    Ensure the run value is not set to something other than 0.  Also make sure query doesn't have a MAXDOP hint.  Otherwise, the cost based optimizer will parallelize when it thinks it's worth it.  Quote from "Inside Microsoft SQL Server 2005: Query Tuning and Optimization", p60

    "The query optimizer decides whether to execute a query in parallel. For the optimizer even to

    consider a parallel plan, the following criteria must be met:

    · SQL Server must be running on a multiprocessor, multicore, or hyperthreaded machine.

    · The affinity mask (and affinity mask64 for 64 processor servers) advanced configuration

    option must allow SQL Server to use at least two processors. The default setting of zero for

    affinity mask allows SQL Server to use all available processors.

    · The max degree of parallelism advanced configuration option must be set to zero (the

    default) or to more than one. We discuss this option in more detail below.

    Like most other decisions, the choice of whether to choose a serial or a parallel plan is costbased. A

    complex and expensive query that processes many rows is more likely to result in a parallel plan

    than a simple query that processes very few rows. Although it is rarely necessary, you can also

    adjust the cost threshold for parallelism advanced configuration setting to raise or lower the

    threshold above which the optimizer considers parallel plans."

    I realise this quote is from a SQL 2005 book, but I think it still applies.

    Interestingly, in my repro script below, the query does parallelize in both SQL 2005 and SQL 2008 on my dual core laptop:

    USE tempdb
    GO
    
    SET NOCOUNT ON
    GO
    
    DROP TABLE ObjectData 
    GO
    CREATE TABLE ObjectData ( ID INT IDENTITY PRIMARY KEY, Data XML )
    GO
    
    INSERT INTO ObjectData ( Data )
    SELECT '<Projects>
    	<Project Value="' + LTRIM( STR( RAND() * 1000 ) ) + '"/>
    	<Project Value="' + LTRIM( STR( RAND() * 1000 ) ) + '"/>
    	</Projects>'
    GO 250000
    
    -- Make sure at least one record meets the criteria
    --UPDATE ObjectData
    --SET Data.modify('replace value of (//Project/@value)[1] with 457')
    --WHERE ID = 150000
    
    --UPDATE ObjectData
    --SET Data.modify('replace value of (//Project/@value)[2] with 555')
    --WHERE ID = 150000
    
    SELECT ID, Data, Data.value('(//Project/@Value)[1]', 'float') AS Value
    FROM ObjectData
    WHERE (Data.exist('(//Project[@Value gt 456])')= 1)
      AND (Data.exist('(//Project[@Value lt 556])')= 1)
    ORDER BY Value DESC
    GO
    
    SELECT ID, Data, Data.value('(//Project/@Value)[1]', 'float') AS Value
    FROM ObjectData
    WHERE (Data.exist('(//Project[@Value gt 456])')= 1)
      AND (Data.exist('(//Project[@Value lt 556])')= 1)
    ORDER BY Value DESC
    OPTION ( MAXDOP 1 )
    

    Try it and compare the execution plans on the last two queries.  On my laptop, they both come back with 50% costs, basically meaning they're about the same in cost.  Other factors on your machine may influence this, eg the number of other queries running, indexing etc.  Compare the two plans on your machine and let us know how you get on.

    Finally, I wouldl worry less about parallelism and more about performance tuning your XML and XQuery.  Start here with these two great articles:

    Performance Optimizations for the XML Data Type in SQL Server 2005
    http://msdn.microsoft.com/en-us/library/ms345118.aspx

    XML Indexes in SQL Server 2005
    http://msdn.microsoft.com/en-us/library/ms345121(SQL.90).aspx

    I have seen performance improvements from making the XML typed.