已答覆 Server really slow - Is it SQL?

  • Friday, August 03, 2012 10:17 AM
     
     

    Hi, I was wondering if anyone can help me.

    Weve got a server that has 1 instance of SQL 2008 r2 rdbms and a reporting services instance running on it. Periodically it is incredibally slow and grinds to a halt.

    When this has been reported to IT they blame SQL server. I dont think it is as during these slow times the CPU isnt maxed out and theres not a great deal of activity going on.

    Im under enourmous (job threatening!) pressure to prove that its not SQL that is causing it.

    Just wondering how you might go about trying to eliminate SQL from the equation, I dont know much about networks or SAN disks.

    Any advice would be enormously appreciated!

    • Moved by Tom PhillipsModerator Monday, August 06, 2012 9:45 PM Probably better answer from the DB Engine forum (From:Transact-SQL)
    •  

All Replies

  • Friday, August 03, 2012 10:32 AM
     
     

    we found that if we have single CPU on sql server (with ssrs, sharepoint installed) then no matter what amount of hard disk or ram you have, once you start a big query(either or sql or ssrs) the cpu consumed goes always upto 100%. everything slow. Tried to optimize the sql server by allocating Maximum Server memory or Tuner or looking at the Profiler, no help. Once we added extra CPU(virtual or physical), it is now normal. Check with network team whether they can add virtual cu to that server or not, if yes, let them add a few for testing purpose and see the result.

    regards

    joon

  • Friday, August 03, 2012 10:33 AM
     
     Proposed Answer

    From your description, the cause can be anything. And it seems that your IT department does not have the interest or the competence to investigate.

    Can you post more information about the machine?

    o  Exact version of SQL Server (SELECT @@version)
    o  Machine configuration, memory and so on.
    o  If machine is virtual, please also give spec for physical machine,
       including brand and version of the virtuallisation software.
    o  Are disks local or on SAN?
    o  Has the service account for SQL Server been granted "Lock pages in
       memory"?

    Overall, I would suggest that your approach should not be to eliminate anything from the equation, but finding the cause. It is not unlikely that it is a combination of things. And while I can think of scenarios where SQL Server is entirely without guilt, my initial assumption would be that SQL Server is involved, since it is after all the dominating application on the machine.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
  • Friday, August 03, 2012 10:37 AM
     
     

    Hi,

    You need to measure what is happening in this server. you may start using perfmon and SQL Server DMV's to identify bottlenecks such as CPU, Memory, Disk or even db implementation failures. Also make sure that SQL Server has max server memory configured. It may cause your server slowness as a first idea. If it is not set and eats all memory then you can blame your IT as this is their job ;) 

    you need to review all other settings as well, please check SQL Server best practices about SQL server performance troubleshooting as well as implementations.

    Janos


    There are 10 type of people. Those who understand binary and those who do not.

    My Blog | Hire Me

  • Friday, August 03, 2012 10:37 AM
     
      Has Code

    Hi, I was wondering if anyone can help me.

    Weve got a server that has 1 instance of SQL 2008 r2 rdbms and a reporting services instance running on it. Periodically it is incredibally slow and grinds to a halt.

    When this has been reported to IT they blame SQL server.

    Usually happens!!!

    Ok, As long as you do not have a transaction as a problem statement, you need to evaluate the system as whole. Its not a straight forward way or easy way to achieve.

    Many people will have different ways to analyse these kind of issues. I am just giving my points.

    My options:

    1. Try to check the wait types occured in the server.(This would give you the problem in very high level).Please post the results.

    --Split up on the wait times for the wait types WITH Waits AS (SELECT wait_type, wait_time_ms / 1000.0 AS WaitS, (wait_time_ms - signal_wait_time_ms) / 1000.0 AS ResourceS, signal_wait_time_ms / 1000.0 AS SignalS, waiting_tasks_count AS WaitCount, 100.0 * wait_time_ms / SUM (wait_time_ms) OVER() AS Percentage, ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS RowNum FROM sys.dm_os_wait_stats WHERE wait_type NOT IN ( 'CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE', 'SLEEP_TASK', 'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH', 'WAITFOR', 'LOGMGR_QUEUE', 'CHECKPOINT_QUEUE', 'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BROKER_TO_FLUSH', 'BROKER_TASK_STOP', 'CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT', 'DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT', 'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'BROKER_EVENTHANDLER', 'TRACEWRITE', 'FT_IFTSHC_MUTEX', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', 'BROKER_RECEIVE_WAITFOR', 'ONDEMAND_TASK_QUEUE', 'DBMIRROR_EVENTS_QUEUE', 'DBMIRRORING_CMD', 'BROKER_TRANSMITTER', 'SQLTRACE_WAIT_ENTRIES', 'SLEEP_BPOOL_FLUSH', 'SQLTRACE_LOCK') ) SELECT W1.wait_type AS WaitType, CAST (W1.WaitS AS DECIMAL(14, 2)) AS Wait_S, CAST (W1.ResourceS AS DECIMAL(14, 2)) AS Resource_S, CAST (W1.SignalS AS DECIMAL(14, 2)) AS Signal_S, W1.WaitCount AS WaitCount, CAST (W1.Percentage AS DECIMAL(4, 2)) AS Percentage, CAST ((W1.WaitS / W1.WaitCount) AS DECIMAL (14, 4)) AS AvgWait_S, CAST ((W1.ResourceS / W1.WaitCount) AS DECIMAL (14, 4)) AS AvgRes_S, CAST ((W1.SignalS / W1.WaitCount) AS DECIMAL (14, 4)) AS AvgSig_S FROM Waits AS W1 INNER JOIN Waits AS W2 ON W2.RowNum <= W1.RowNum GROUP BY W1.RowNum, W1.wait_type, W1.WaitS, W1.ResourceS, W1.SignalS, W1.WaitCount, W1.Percentage HAVING SUM (W2.Percentage) - W1.Percentage < 95; -- percentage threshold


    2. Turn on Perfmon counters for the server (including SQL server) and analyse the values.

    http://www.quest.com/backstage/images/promotions/SQLServer-Perfmonance-Poster.pdf

    3. Please let us know sp_configure results.


    Please click the Mark as Answer or Vote As Helpful button if a post solves your problem or is helpful!

  • Friday, August 03, 2012 11:03 AM
     
     

    In my experience, usually, but certainly not always, disk IO is the main bottleneck for systems like MS SQL Server and Sybase SQL Server. Disk IO is expensive. If disk IO is the bottleneck, and Lateesh's query will tell you if it is, the problem, however, is trying to figure out why there is so much disk IO.

    It can be heavy use of TempDB. Having multiple files for TempDB each on a disk with its own IO channel can help. Or rewriting the queries can help.

    It can be a query causing a table scan in a loop on a table to large to fit into memory. Maybe that is caused by a missing index. Or maybe a badly written piece of SQL code causes this problem.

    Anyway, you need to measure what is going on before you can tell.

  • Friday, August 03, 2012 11:33 AM
     
     

    Thanks everyone for this really helpful advice.

    Unfortunately the entire network is down at the minute, but as soon as its back up Ill post the information.

    What I can tell you is that there is 16GB of memory - its a virtual machine running Windows 2008 r2 with 4 processors.

    There are 3 disks a 300GB C drive and a 100GB D and 100GB E (data and logs are stored on C and E)

    When it runs really slowly I normally look at the CPU utilization on the task manager - its only about 2 or 3 %

    The memory is sitting at about 12 GB but thats normal (isnt it?) I have 12 GB maximum memory in SQL server settings

    I look in activity monitor and theres nothing really happening there either. When I run sp_who2 I dont see much IO. This is why I think its not a sql query. Ive done a lot of work on the indexes and stats, but I just cant see a big query running.

    I have 4 tempdb datafiles, but theyre all on the same LUN - do you get better performance if you separate them onto different LUNS?

    Im not sure about the "Lock pages memory" setting. Should it be turned on?

    Ill run that query as soon as I can get back on the box

    Thats soo much for your help everyone, I really do Appreciate it!

    Zoe

  • Friday, August 03, 2012 11:43 AM
     
     

    I guess I have to add one important thing....

    Did you see any unusal logs in your errorlog?(check all logs...SQL errorlog, application evetn log etc...)


    Please click the Mark as Answer or Vote As Helpful button if a post solves your problem or is helpful!

  • Friday, August 03, 2012 11:47 AM
     
     

    nope - nothing. I even turned off the sql agent to make sure there werent any jobs running.

    Nothing in the SQL log, app log or sys log.

  • Friday, August 03, 2012 11:49 AM
     
     

    Hi,

    As you said it is a VM, do you see any external memory pressure?

    Janos


    There are 10 type of people. Those who understand binary and those who do not.

    My Blog | Hire Me

  • Friday, August 03, 2012 12:11 PM
     
     

    Not really sure what you mean?  How could I tell if there was external memory pressure?

    I know they use VMWare

  • Friday, August 03, 2012 12:18 PM
     
     

    you need to ask you VMWare admin to monitor the memory pressure. Alos please take a look on this article> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003470

    Janos


    There are 10 type of people. Those who understand binary and those who do not.

    My Blog | Hire Me

  • Friday, August 03, 2012 1:22 PM
     
     

    A couple of more questions.

    First, please post SELECT @@version when you get access again.

    Next, you have said the server is slow. But what exactly is slow? Certain queries? Only updates? Connections? Report rendering? Does a query like "sp_who" take ages?

    You are running VMware, which means that overprovisioning of memory is possible. That is VMware server can tell the guest operating system to shrink the memory, which of course could have some effects.

    The permission "Lock pages in memory" prevents SQL Server's memory from being swapped to disk, which can happen if there are evil processes on the machine - like for instance VMware's baloon driver.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
  • Friday, August 03, 2012 2:06 PM
     
     
    In my experience, database servers and virtualisation don't go well together. You need dedicated disk storage and a dedicated IO channel for you virtual SQL Server, otherwise the caching mechanism will have an extra layer in which data caching has to compete for caching storage with for instance a file server. I have seen customers with disasterous performance problems because of that.
  • Friday, August 03, 2012 9:35 PM
     
     

    In my experience, database servers and virtualisation don't go well together. You need dedicated disk storage and a dedicated IO channel for you virtual SQL Server, otherwise the caching mechanism will have an extra layer in which data caching has to compete for caching storage with for instance a file server. I have seen customers with disasterous performance problems because of that.


    I would say that virtualiisation per se is not so much of a problem. There is of course some overhead in the VM software, but you would have to have really high performance requirements for it to matter.

    Where you may get problems with virtualisation is the VM has to share resources with over VMs in the same machine, that is either CPU or memory. But there is so much other things to gain from a manageability perspective, that most IT department would only agree to run your SQL Server physical server if it's really a Tier-1 system with a capital T.

    SANs are a different matter. Again, SANs makes life so much easier for IT departments, so it may be impossible to evade them. And a dedicated SAN is what you need, if you really want to make your DB application scream. But the reason your IT department likes SAN is that they can have disk for umpteen servers in one place. And this can really hurt SQL Server, if there is some other server thrashing the SAN. But this applies no matter SQL Server runs on a physical or virtual machine.

    In Zoe's case, we only know that her server is very slow sometime, but we don't know in what way it's slow.

    Oh, there is one more thing I would like Zoe to report. What does this query report 1) under normal circumstances 2) when the server is slow?

    SELECT COUNT(*) FROM sys.syscacheobjects


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
  • Saturday, August 04, 2012 9:33 AM
     
     

    As usual, you are right Erland. Still, a SAN adds another layer of caching and I have seen things happening like below:

    - Loading a backup went fine until 40% of the database was loaded and the throughput went down to 8% of the original speed.

    We could trace this back to the extra layer of caching and the competition for this cache by other servers.

    But then again, you are right. I now automatically assume that past experiences apply to the current situation without knowing the parameters of the current problem. There could be so much else causing this.


  • Saturday, August 04, 2012 7:54 PM
     
     

    The point I wanted to drive is virtualisations <> SAN.

    Shared SANs often cause a headache for the DBA. Not the least because communication with SAN administrators is not always simple.

    It's not so much the caching that is the problem, but the competition wityh other systems. I heard a story by a fellow MVP who told that he was able to save his customer from buying separate hardware for an Oracle application. No, he is not an Oracle expert, but he fixed the bad query on the SQL Server side that choked the SAN.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
  • Monday, August 06, 2012 4:24 PM
     
     

    @@version gives:

    Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64)   Jun 17 2011 00:54:03   Copyright (c) Microsoft Corporation  Enterprise Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) (Hypervisor)

    That big query gives me:

    Divide by zero error encountered

  • Monday, August 06, 2012 4:53 PM
     
     

    Change in that query:

    CAST ((W1.WaitS / NULLIF(W1.WaitCount,0)) AS DECIMAL (14, 4)) AS AvgWait_S, CAST ((W1.ResourceS / NULLIF(W1.WaitCount,0)) AS DECIMAL (14, 4)) AS AvgRes_S, CAST ((W1.SignalS / NULLIF(W1.WaitCount,0)) AS DECIMAL (14, 4)) AS AvgSig_S


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

  • Monday, August 06, 2012 9:59 PM
    Moderator
     
     

    There are special conciderations when installing SQL Server in a VM.  Please see thehttp://www.vmware.com/files/pdf/sql_server_virt_bp.pdf

    http://www.vmware.com/files/pdf/sql_server_best_practices_guide.pdf

    Is the VM HOST "slow" or the client? 

    What you are describing is almost certainly due to memory page swapping or VM ballooning.

  • Tuesday, August 07, 2012 12:28 AM
     
      Has Code

    To identify Sql Server cpu consumption vs other processes on the server (which I think is what you want to prove that it is not Sql Server hogging the cpu) then run this:

    DECLARE @ts_now bigint 
      SET @ts_now = (SELECT cpu_ticks/(cpu_ticks/ms_ticks)FROM sys.dm_os_sys_info)
      IF LEFT(CONVERT(VARCHAR(50),SERVERPROPERTY('productversion')),2) = '9.' AND LEFT(CONVERT(VARCHAR(50), SERVERPROPERTY ('edition')), 7) = 'Express'
                    BEGIN	                    
                        SELECT 
    		                GETDATE() AS [EventTime]
                            , '' AS [FormattedEventTime]
                            , 0 AS [SqlCpu]
                            , 0 AS [OtherCpu]
                            , 0 AS [Idle]      
                    END
                    ELSE
                    BEGIN  
    	                SELECT TOP(240) DATEADD(ms, -1 * (@ts_now - [timestamp]), GETDATE()) as [EventTime]
    					                , CONVERT(CHAR(2),DATEPART(HOUR,DATEADD(ms, -1 * (@ts_now - [timestamp]), GETDATE())))
    					                + ':' + RIGHT('0' + CONVERT(VARCHAR(2),DATEPART(MINUTE,DATEADD(ms, -1 * (@ts_now - [timestamp]), GETDATE()))),2) AS [FormattedEventTime]
    					                , SQLProcessUtilization AS [SqlCpu]
    					                , 100 - SystemIdle - SQLProcessUtilization AS [OtherCpu]
    					                , SystemIdle AS [Idle]                               
                       
    	                FROM (
    		                  SELECT record.value('(./Record/@id)[1]', 'int') AS record_id,
    				                record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int')
    				                AS [SystemIdle],
    				                record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]',
    				                'int')
    				                AS [SQLProcessUtilization], [timestamp]
    		                  FROM (
    				                SELECT [timestamp], CONVERT(xml, record) AS [record]
    				                FROM sys.dm_os_ring_buffers (nolock)
    				                WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR'
    				                AND record LIKE '%<SystemHealth>%') AS x
    		                  ) AS y
    	                ORDER BY record_id DESC      
                    END;

    It is taken from the free tool linked in my sig :-)


    www.minidba.com | Sql Server Performance Tuning & Monitoring Software

  • Tuesday, August 07, 2012 6:34 AM
     
     

    OK, so you have 64-bit SQL Server, which means that we can rule out some of the possibilities.

    I would agree with Tom that the baloon driver is a suspect. The baloon driver is something VMware uses to lure the guest machine to reduces it's physical memory consumption, and it hurts SQL Server badly. You can counteract, if you grant the service account "Lock pages in memory".

    But we still know very little about your situation. For instance you still have not explained in what sense the server is slow.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

  • Tuesday, August 07, 2012 7:51 AM
     
     

    Zeo, just to summarize for my all fellow mates here to make it easier to troubleshoot further:

    - At the time of the performance problem, CPU is 1-2 percent.

    - At the time of the performance problem, Memory is at 12 GB.

    - At the time of the performance problem, there is no blocking.

    - At the time of the performance problem, there are no errors/warning on SQL Error logs, App log or System logs.

    Zeo, can you please confirm above?

    Also, as Erland said that we still don't have an exact description on the slowness you are facing. Would you please define it?

  • Tuesday, August 07, 2012 8:01 AM
     
      Has Code

    Could you please try the below and let us know the result:

    WITH Waits AS
        (SELECT
            wait_type,
            wait_time_ms / 1000.0 AS WaitS,
            (wait_time_ms - signal_wait_time_ms) / 1000.0 AS ResourceS,
            signal_wait_time_ms / 1000.0 AS SignalS,
            waiting_tasks_count AS WaitCount,
            100.0 * wait_time_ms / SUM (wait_time_ms) OVER() AS Percentage,
            ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS RowNum
        FROM sys.dm_os_wait_stats
        WHERE wait_type NOT IN (
            'CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE', 'SLEEP_TASK',
            'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH', 'WAITFOR', 'LOGMGR_QUEUE',
            'CHECKPOINT_QUEUE', 'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BROKER_TO_FLUSH',
            'BROKER_TASK_STOP', 'CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT', 'DISPATCHER_QUEUE_SEMAPHORE',
            'FT_IFTS_SCHEDULER_IDLE_WAIT', 'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'BROKER_EVENTHANDLER',
            'TRACEWRITE', 'FT_IFTSHC_MUTEX', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
            'BROKER_RECEIVE_WAITFOR', 'ONDEMAND_TASK_QUEUE', 'DBMIRROR_EVENTS_QUEUE',
            'DBMIRRORING_CMD', 'BROKER_TRANSMITTER', 'SQLTRACE_WAIT_ENTRIES',
            'SLEEP_BPOOL_FLUSH', 'SQLTRACE_LOCK')
        )
    SELECT
        W1.wait_type AS WaitType, 
        CAST (W1.WaitS AS DECIMAL(14, 2)) AS Wait_S,
        CAST (W1.ResourceS AS DECIMAL(14, 2)) AS Resource_S,
        CAST (W1.SignalS AS DECIMAL(14, 2)) AS Signal_S,
        W1.WaitCount AS WaitCount,
        CAST (W1.Percentage AS DECIMAL(4, 2)) AS Percentage,
        CAST((W1.WaitS/ NULLIF(W1.WaitCount,0)) AS DECIMAL (14,4)) AS AvgWait_S, 
        CAST ((W1.ResourceS/ NULLIF(W1.WaitCount,0)) AS DECIMAL (14,4)) AS AvgRes_S, 
        CAST ((W1.SignalS/ NULLIF(W1.WaitCount,0)) AS DECIMAL (14,4)) AS AvgSig_S
    FROM Waits AS W1
        INNER JOIN Waits AS W2 ON W2.RowNum <= W1.RowNum
    GROUP BY W1.RowNum, W1.wait_type, W1.WaitS, W1.ResourceS, W1.SignalS, W1.WaitCount, W1.Percentage
    HAVING SUM (W2.Percentage) - W1.Percentage < 95; -- percentage threshold


    Please click the Mark as Answer or Vote As Helpful button if a post solves your problem or is helpful!

  • Tuesday, August 07, 2012 11:25 AM
     
     Answered

    Hi everyone!

    Thanks so much for your help, that was really useful information.

    It turned out that the slowness was caused by something called 'commvault' which was taking backups of the disks. when this was turned off, it massively sped up.

    Really good ideas there though to troubleshoot problems in future - ill definately use this again!

    All the best and Thanks again

    Zoe

    • Marked As Answer by Zoe.Ohara Tuesday, August 07, 2012 11:25 AM
    •