locked
Slow Perfomance and Max Degree of Parallelism Configuration RRS feed

  • Question

  • Hi,

     

    We are using client server architecture for our application connection. SQL Server Installed in one server and all clients systems connect to this server through application. (Approximately daily 150 to 200 users).

     

    The SQL Server details:

    Version: Microsoft SQL Server 2008 R2 Standard Edition (64-bit).

    Application Database Size:  50 GB.

     

    The server details as below:

    OS Name             Microsoft Windows Server 2008 R2 Enterprise

    Processor            Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00 GHz, 2000 MHz, 4 Core(s), 4 Logical Processor(s)

    Processor            Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00 GHz, 2000 MHz, 4 Core(s), 4 Logical Processor(s)

    Installed Physical Memory (RAM): 64.0 GB.

     

     

    Now we are facing the performance issues from all users and application is running very slow and freezing.

     

    We verified all indexes fragmentation and rebuild, reorganized indexes and updated statistics. But still the performance remains same.

     

    We want to configure max degree of parallelism option to Sever.  Is it necessary to configure this option for our server(Only having 8 Logical Processors) ?

     If yes, based on our server configuration please suggest the max degree parallelism value?

     

    Also please suggest any other options to improve performance.

    When we are verifying log file and Temp DB, everything seems fine. But we can’t find root cause.

     

     

    Regards,

    Srinivas Ch.

    Tuesday, November 1, 2016 11:12 AM

Answers

  • It is possible that you are running into problems with parallelism. Before you configure parallelism server wide you need to look at your wait statistics to see what your server bottleneck is and address the bottleneck or determine if there is an individual slow query.

    Post your wait stats back here.
    WITH Waits AS
     
    (
     
    SELECT 
       
    wait_type
       
    wait_time_ms 1000. AS wait_time_s,
       
    100. wait_time_ms SUM(wait_time_msOVER() AS pct,
       
    ROW_NUMBER() OVER(ORDER BY wait_time_ms DESCAS rn
     
    FROM sys.dm_os_wait_stats
     
    WHERE wait_type 
       
    NOT IN
         
    ('CLR_SEMAPHORE''LAZYWRITER_SLEEP''RESOURCE_QUEUE',
       
    'SLEEP_TASK''SLEEP_SYSTEMTASK''SQLTRACE_BUFFER_FLUSH''WAITFOR',
       
    'CLR_AUTO_EVENT''CLR_MANUAL_EVENT')
       ) 
    -- filter out additional irrelevant waits
       
    SELECT W1.wait_type,
     
    CAST(W1.wait_time_s AS DECIMAL(122)) AS wait_time_s,
     
    CAST(W1.pct AS DECIMAL(122)) AS pct,
     
    CAST(SUM(W2.pctAS DECIMAL(122)) AS running_pct
    FROM Waits AS W1
     
    INNER JOIN Waits AS W2 ON W2.rn <= W1.rn
    GROUP BY W1.rn
     
    W1.wait_type
     
    W1.wait_time_s
     
    W1.pct
    HAVING SUM(W2.pct) - W1.pct 95-- percentage threshold

    Run this one time.

    Then wait 5 minutes and run it again.

    post both results back here.

    Also check for blocking.

    • Marked as answer by srinivas_ch Thursday, November 24, 2016 9:46 AM
    Tuesday, November 1, 2016 11:16 AM
  • Thank you very much for your quick response.

    I ran the above query, here is the result set

    First time Result
    wait_type wait_time_s pct running_pct
    CXPACKET 135241.46 17.03 17.03
    REQUEST_FOR_DEADLOCK_SEARCH 86154.33 10.85 27.88
    SQLTRACE_INCREMENTAL_FLUSH_SLEEP 86149.32 10.85 38.73
    XE_TIMER_EVENT 86142.31 10.85 49.58
    LOGMGR_QUEUE 86084.39 10.84 60.42
    FT_IFTS_SCHEDULER_IDLE_WAIT 85396.52 10.75 71.18
    XE_DISPATCHER_WAIT 78158.52 9.84 81.02
    CHECKPOINT_QUEUE 75933.76 9.56 90.58
    BROKER_TO_FLUSH 43078.89 5.43 96.01

    Second time Result set :

    wait_type wait_time_s pct running_pct
    CXPACKET 135418.26 17.02 17.02
    REQUEST_FOR_DEADLOCK_SEARCH 86474.80 10.87 27.88
    SQLTRACE_INCREMENTAL_FLUSH_SLEEP 86474.07 10.87 38.75
    XE_TIMER_EVENT 86472.47 10.87 49.61
    LOGMGR_QUEUE 86226.10 10.83 60.45
    FT_IFTS_SCHEDULER_IDLE_WAIT 85756.60 10.78 71.22
    XE_DISPATCHER_WAIT 78158.52 9.82 81.04
    CHECKPOINT_QUEUE 75933.76 9.54 90.58
    BROKER_TO_FLUSH 43240.52 5.43 96.02

    Note : this is not peak time. Only very few users are using the application.

    We didn't find any blocking.

    Thanks & Regards,

    Srinivas Ch.

    • Marked as answer by srinivas_ch Thursday, November 24, 2016 9:46 AM
    Tuesday, November 1, 2016 11:35 AM
  • well there are some cxpacket waits. Nothing huge yet. Perhaps run this during the peak day. I run this query and aggreate results every 5 minutes. This way I can tell trending and have a baseline.

    you may also find it helpful to issue this query which will tell you what procs/queries are using parallelism.

    SELECT TOP 10
    p.*,
    q.*,
    qs.*,
    cp.plan_handle
    FROM
    sys.dm_exec_cached_plans cp
    CROSS apply sys.dm_exec_query_plan(cp.plan_handle) p
    CROSS apply sys.dm_exec_sql_text(cp.plan_handle) AS q
    JOIN sys.dm_exec_query_stats qs
    ON qs.plan_handle = cp.plan_handle
    WHERE
    cp.cacheobjtype = 'Compiled Plan' AND
    p.query_plan.value('declare namespace p="http://schemas.microsoft.com/sqlserver/2004/07/showplan";
    max(//p:RelOp/@Parallel)', 'float') > 0
    OPTION (MAXDOP 1)

    • Marked as answer by srinivas_ch Thursday, November 24, 2016 9:46 AM
    Tuesday, November 1, 2016 11:52 AM
  • Srinivas,

    Setting server wide MAXDOP is probably not the answer.  Most likely it is query design, missing indexes or out of date stats (almost everything I see falls into these three...)

    If you have not used Adam Machanic's sp_whoisactive, please go here to download it:

    http://whoisactive.com/

    I create this as a temporary stored proc and use it to get a quick view of my longest running sessions at any given time to start my investigation.

    Also, please make sure your tempdb is configured to best practices (4 or 8 data files depending on who you listen to), one .ldf file, autogrow set to reasonable size, not 10% or 1mb, etc.

    Kevin3NF

    The OnPurposeDBA

    • Marked as answer by srinivas_ch Thursday, November 24, 2016 9:46 AM
    Tuesday, November 1, 2016 12:52 PM
  • You can see whether the index op is running in parallel by doing:

    EXEC sp_WhoIsActive @get_task_info = 2

    ... then look in the [tasks] column to see how many tasks are spun up.




    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; 

    WITH XMLNAMESPACES   
       (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')  
    SELECT  
            query_plan AS CompleteQueryPlan, 
            n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText, 
            n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel, 
            n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS StatementSubTreeCost, 
            n.query('.') AS ParallelSubTreeXML,  
            ecp.usecounts, 
            ecp.size_in_bytes 
    FROM sys.dm_exec_cached_plans AS ecp 
    CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp 
    CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n) 
    WHERE  n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1 

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    • Marked as answer by srinivas_ch Thursday, November 24, 2016 9:46 AM
    Tuesday, November 1, 2016 12:58 PM
  • You need to determine the actual problem.  MaxDOP likely has nothing to do with your problem.

    What you are describing is generally related to a "blocking" problem, where other processes are waiting on something to finish.  You would need to look at the system when the problem is occurring and determine what is going on.

    • Marked as answer by srinivas_ch Thursday, November 24, 2016 9:46 AM
    Tuesday, November 1, 2016 1:10 PM
    Answerer

All replies

  • It is possible that you are running into problems with parallelism. Before you configure parallelism server wide you need to look at your wait statistics to see what your server bottleneck is and address the bottleneck or determine if there is an individual slow query.

    Post your wait stats back here.
    WITH Waits AS
     
    (
     
    SELECT 
       
    wait_type
       
    wait_time_ms 1000. AS wait_time_s,
       
    100. wait_time_ms SUM(wait_time_msOVER() AS pct,
       
    ROW_NUMBER() OVER(ORDER BY wait_time_ms DESCAS rn
     
    FROM sys.dm_os_wait_stats
     
    WHERE wait_type 
       
    NOT IN
         
    ('CLR_SEMAPHORE''LAZYWRITER_SLEEP''RESOURCE_QUEUE',
       
    'SLEEP_TASK''SLEEP_SYSTEMTASK''SQLTRACE_BUFFER_FLUSH''WAITFOR',
       
    'CLR_AUTO_EVENT''CLR_MANUAL_EVENT')
       ) 
    -- filter out additional irrelevant waits
       
    SELECT W1.wait_type,
     
    CAST(W1.wait_time_s AS DECIMAL(122)) AS wait_time_s,
     
    CAST(W1.pct AS DECIMAL(122)) AS pct,
     
    CAST(SUM(W2.pctAS DECIMAL(122)) AS running_pct
    FROM Waits AS W1
     
    INNER JOIN Waits AS W2 ON W2.rn <= W1.rn
    GROUP BY W1.rn
     
    W1.wait_type
     
    W1.wait_time_s
     
    W1.pct
    HAVING SUM(W2.pct) - W1.pct 95-- percentage threshold

    Run this one time.

    Then wait 5 minutes and run it again.

    post both results back here.

    Also check for blocking.

    • Marked as answer by srinivas_ch Thursday, November 24, 2016 9:46 AM
    Tuesday, November 1, 2016 11:16 AM
  • Thank you very much for your quick response.

    I ran the above query, here is the result set

    First time Result
    wait_type wait_time_s pct running_pct
    CXPACKET 135241.46 17.03 17.03
    REQUEST_FOR_DEADLOCK_SEARCH 86154.33 10.85 27.88
    SQLTRACE_INCREMENTAL_FLUSH_SLEEP 86149.32 10.85 38.73
    XE_TIMER_EVENT 86142.31 10.85 49.58
    LOGMGR_QUEUE 86084.39 10.84 60.42
    FT_IFTS_SCHEDULER_IDLE_WAIT 85396.52 10.75 71.18
    XE_DISPATCHER_WAIT 78158.52 9.84 81.02
    CHECKPOINT_QUEUE 75933.76 9.56 90.58
    BROKER_TO_FLUSH 43078.89 5.43 96.01

    Second time Result set :

    wait_type wait_time_s pct running_pct
    CXPACKET 135418.26 17.02 17.02
    REQUEST_FOR_DEADLOCK_SEARCH 86474.80 10.87 27.88
    SQLTRACE_INCREMENTAL_FLUSH_SLEEP 86474.07 10.87 38.75
    XE_TIMER_EVENT 86472.47 10.87 49.61
    LOGMGR_QUEUE 86226.10 10.83 60.45
    FT_IFTS_SCHEDULER_IDLE_WAIT 85756.60 10.78 71.22
    XE_DISPATCHER_WAIT 78158.52 9.82 81.04
    CHECKPOINT_QUEUE 75933.76 9.54 90.58
    BROKER_TO_FLUSH 43240.52 5.43 96.02

    Note : this is not peak time. Only very few users are using the application.

    We didn't find any blocking.

    Thanks & Regards,

    Srinivas Ch.

    • Marked as answer by srinivas_ch Thursday, November 24, 2016 9:46 AM
    Tuesday, November 1, 2016 11:35 AM
  • well there are some cxpacket waits. Nothing huge yet. Perhaps run this during the peak day. I run this query and aggreate results every 5 minutes. This way I can tell trending and have a baseline.

    you may also find it helpful to issue this query which will tell you what procs/queries are using parallelism.

    SELECT TOP 10
    p.*,
    q.*,
    qs.*,
    cp.plan_handle
    FROM
    sys.dm_exec_cached_plans cp
    CROSS apply sys.dm_exec_query_plan(cp.plan_handle) p
    CROSS apply sys.dm_exec_sql_text(cp.plan_handle) AS q
    JOIN sys.dm_exec_query_stats qs
    ON qs.plan_handle = cp.plan_handle
    WHERE
    cp.cacheobjtype = 'Compiled Plan' AND
    p.query_plan.value('declare namespace p="http://schemas.microsoft.com/sqlserver/2004/07/showplan";
    max(//p:RelOp/@Parallel)', 'float') > 0
    OPTION (MAXDOP 1)

    • Marked as answer by srinivas_ch Thursday, November 24, 2016 9:46 AM
    Tuesday, November 1, 2016 11:52 AM
  • Do not hurry to change the MAX DOP to 1. I would capture long running queries over a day and try optimize them....CXPACKET is on the top but does not have high impact percentage

    BTW what  max memory have you set up?


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Tuesday, November 1, 2016 11:54 AM
  • Ok.. I will run the same query during the peak hours and will share my findings if any with you.

    Thank you.

    Tuesday, November 1, 2016 12:27 PM
  • We have allocated 54 GB out of 64 GB for SQL Server.

    Tuesday, November 1, 2016 12:30 PM
  • Ok.. I will run the same query during the peak hours and will share my findings if any with you.

    Thank you.

    Tuesday, November 1, 2016 12:35 PM
  • Srinivas,

    Setting server wide MAXDOP is probably not the answer.  Most likely it is query design, missing indexes or out of date stats (almost everything I see falls into these three...)

    If you have not used Adam Machanic's sp_whoisactive, please go here to download it:

    http://whoisactive.com/

    I create this as a temporary stored proc and use it to get a quick view of my longest running sessions at any given time to start my investigation.

    Also, please make sure your tempdb is configured to best practices (4 or 8 data files depending on who you listen to), one .ldf file, autogrow set to reasonable size, not 10% or 1mb, etc.

    Kevin3NF

    The OnPurposeDBA

    • Marked as answer by srinivas_ch Thursday, November 24, 2016 9:46 AM
    Tuesday, November 1, 2016 12:52 PM
  • You can see whether the index op is running in parallel by doing:

    EXEC sp_WhoIsActive @get_task_info = 2

    ... then look in the [tasks] column to see how many tasks are spun up.




    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; 

    WITH XMLNAMESPACES   
       (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')  
    SELECT  
            query_plan AS CompleteQueryPlan, 
            n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText, 
            n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel, 
            n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS StatementSubTreeCost, 
            n.query('.') AS ParallelSubTreeXML,  
            ecp.usecounts, 
            ecp.size_in_bytes 
    FROM sys.dm_exec_cached_plans AS ecp 
    CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp 
    CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n) 
    WHERE  n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1 

    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    • Marked as answer by srinivas_ch Thursday, November 24, 2016 9:46 AM
    Tuesday, November 1, 2016 12:58 PM
  • CXPACKET is expected wait time. If a query run over several cores, then when one core is done with its work, it has to wait until the last one is done with its work. So towards the end of that operator processing, all but one core are waiting for the final one to finish its part of the work to be done. This is why it adds up so much.

    Setting to a lower maxdop will reduce this wait, but it will take longer to process the query!

    So, I'd like to echo Kevin's statement to look for the problem elsewhere. Waitstats can give you an idea, but the query you ran gave us only those kind of waits that we should ignore. I.e., you need a bigger ignore list (the NOT IN). I suggest you grab the query from Glenn's diag script and then post the result: http://sqlserverperformance.wordpress.com/tag/dmv-queries/ . This will give us a starting point.

    Then try to adress the real problem. Again, Kevin gave the usual suspects (query design, indexes, bad stats and I'd like to add potential blocking). 


    Tibor Karaszi, SQL Server MVP (Web Blog)

    Tuesday, November 1, 2016 1:09 PM
  • You need to determine the actual problem.  MaxDOP likely has nothing to do with your problem.

    What you are describing is generally related to a "blocking" problem, where other processes are waiting on something to finish.  You would need to look at the system when the problem is occurring and determine what is going on.

    • Marked as answer by srinivas_ch Thursday, November 24, 2016 9:46 AM
    Tuesday, November 1, 2016 1:10 PM
    Answerer