locked
SQL 2005 cluster gets big performance hit and website fails, resolved with failover - I dont know why? RRS feed

  • Question

  • Hello,

    I have a Windows 2003 (SP2) (8GB RAM) cluster running SQL 2005 (SP3) . Often, but not regularly the is a peak in the CPU that causes the site to become unresponsive, if we perform a failover the issue is resolved until the next unexplained spike.

    I have performance counters in place and during the issue there is a peak in CPU, memory and processes blocked, as expectd the traces level out after fail over.

    in the default trace log displayed in profiler during the issue there is a large number of 'audit login failed' from the various websevers using the application '.net sqlclient data provider'

    just prior to the spike there are a lrge number of hash and sort warnings also from aplication '.net sqlclient data provider'

    previously i found some event classes 'missing join predicate' which were detailed as fixed in cumulative update 7 which was applied over the weekend, theses are still present.

    If you have any advice I would be grateful

    Thanks, Colm.

    Friday, May 21, 2010 7:49 AM

All replies

  • Hi,

    During the spike process look out for parallelism in your activity monitor and look out for CXPACKETS. U can temporarily hault the CPU spike by reducing the MAXDOP but in the long run you have to look out for optimizing the queries involved in the process. The audit login failed messages usually come in when the CPUs are not able to attend to requests comming in from the client hence its not able to cater to the new connections requests due to high CPU utilization and spikes.

    Regards

     


    Rojit
    Saturday, May 22, 2010 10:44 AM
  • ---This first thing to check if CPU is at 100% is to look for parallel queries:

    -- Tasks running in parallel (filtering out MARS requests below):

    select

     

    * from sys.dm_os_tasks as t

     

    where t.session_id in (

     

    select t1.session_id

     

    from sys.dm_os_tasks as t1

     

    group by t1.session_id

     

    having count(*) > 1

     

    and min(t1.request_id) = max(t1.request_id));

    -- Requests running in parallel:

     

    select *

     

    from sys.dm_exec_requests as r

     

    join (

     

    select t1.session_id, min(t1.request_id)

     

    from sys.dm_os_tasks as t1

     

    group by t1.session_id

     

    having count(*) > 1

     

    and min(t1.request_id) = max(t1.request_id)

     

    ) as t(session_id, request_id)

     

    on r.session_id = t.session_id

     

    and r.request_id = t.request_id;

    Sunday, May 30, 2010 9:35 AM