none
SQL 2005 event log

    Question

  • Hi,

    My SQL 2005 server event log appear to have one issue with IO, " SQL Server has encountered 7 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file ... " .

    It happen only to one drive at certain time, this drive contains my sharepoint database and also tempdb file. The average disc read percentage hit max value each time .

    I am confuse to identify the root cause for it ( tempdb ? IO bottleneck ? memory bottleneck? ) . I need someone that can guide me the step to troubleshoot and solve this problem. Thanks

    Regards,

    Chungb 

    Thursday, July 04, 2013 10:03 AM

Answers

  • Either your disk is flooded with Large IO requests (high number of DML, Large scan due to improper/missing index, poor table design) or there could be hardware problem.

    As you say its occurring a certain time, find out those large IO requester and if possible fine tune them to use less IO. To identify offensive queries, you can use sql trace or extended events.

    http://www.solidq.com/sqj/Pages/Relational/Tracing-Query-Performance-with-Extended-Events.aspx

    http://blogs.msdn.com/b/sql_pfe_blog/archive/2013/03/14/dissecting-sql-server-physical-reads-with-extended-events-and-process-monitor.aspx

    But its recommended a verification over IO subsystem to ensure there is no issue disk drives.


    Srinivasan

    • Marked as answer by chungb Tuesday, July 16, 2013 2:55 AM
    Thursday, July 04, 2013 10:24 AM
  • Hi,

    I/O delays points to fact of poorely written query generating lot of reads and writed which you I/O subsystem is not able to handle.It can also be due to faulty hardware.Can you post result of below query it will give top waits you occured on system>this will let us reach logical conclusion...

    WITH [Waits] AS
        (SELECT
            [wait_type],
            [wait_time_ms] / 1000.0 AS [WaitS],
            ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
            [signal_wait_time_ms] / 1000.0 AS [SignalS],
            [waiting_tasks_count] AS [WaitCount],
            100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],
            ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum]
        FROM sys.dm_os_wait_stats
        WHERE [wait_type] NOT IN (
            N'CLR_SEMAPHORE',    N'LAZYWRITER_SLEEP',
            N'RESOURCE_QUEUE',   N'SQLTRACE_BUFFER_FLUSH',
            N'SLEEP_TASK',       N'SLEEP_SYSTEMTASK',
            N'WAITFOR',          N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
            N'CHECKPOINT_QUEUE', N'REQUEST_FOR_DEADLOCK_SEARCH',
            N'XE_TIMER_EVENT',   N'XE_DISPATCHER_JOIN',
            N'LOGMGR_QUEUE',     N'FT_IFTS_SCHEDULER_IDLE_WAIT',
            N'BROKER_TASK_STOP', N'CLR_MANUAL_EVENT',
            N'CLR_AUTO_EVENT',   N'DISPATCHER_QUEUE_SEMAPHORE',
            N'TRACEWRITE',       N'XE_DISPATCHER_WAIT',
            N'BROKER_TO_FLUSH',  N'BROKER_EVENTHANDLER',
            N'FT_IFTSHC_MUTEX',  N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
            N'DIRTY_PAGE_POLL',  N'SP_SERVER_DIAGNOSTICS_SLEEP')
        )
    SELECT
        [W1].[wait_type] AS [WaitType],
        CAST ([W1].[WaitS] AS DECIMAL(14, 2)) AS [Wait_S],
        CAST ([W1].[ResourceS] AS DECIMAL(14, 2)) AS [Resource_S],
        CAST ([W1].[SignalS] AS DECIMAL(14, 2)) AS [Signal_S],
        [W1].[WaitCount] AS [WaitCount],
        CAST ([W1].[Percentage] AS DECIMAL(4, 2)) AS [Percentage],
        CAST (([W1].[WaitS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgWait_S],
        CAST (([W1].[ResourceS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgRes_S],
        CAST (([W1].[SignalS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgSig_S]
    FROM [Waits] AS [W1]
    INNER JOIN [Waits] AS [W2]
        ON [W2].[RowNum] <= [W1].[RowNum]
    GROUP BY [W1].[RowNum], [W1].[wait_type], [W1].[WaitS],
        [W1].[ResourceS], [W1].[SignalS], [W1].[WaitCount], [W1].[Percentage]
    HAVING SUM ([W2].[Percentage]) - [W1].[Percentage] < 95; -- percentage threshold
    GO


    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

    • Marked as answer by chungb Tuesday, July 16, 2013 2:55 AM
    Thursday, July 04, 2013 10:33 AM
  • Hi Shanky,

    Can you explain more in details about your query, and why you think it is related with backup operation?

    Regards,

    Hi ,

    May be some third party backup tool is taking ur system backup you can go to error log and check for VIRTUAL_DEVICE if it is present in backup message means backup is occuring thru Third party tool on network share

    Also for wait type WRITELOG...it shows there is lot of delay in writing logs on log file drive also ur read write time is high..with PAGEIOLATCH wait i would also recommend to get ur disk checked,disk on which ur trn files are residing...it may be disk/SAN issue .SAN must be updated to latest firmwares to perform better ..

    thanks


    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

    • Marked as answer by chungb Tuesday, July 16, 2013 2:55 AM
    Monday, July 08, 2013 2:40 PM

All replies

  • Either your disk is flooded with Large IO requests (high number of DML, Large scan due to improper/missing index, poor table design) or there could be hardware problem.

    As you say its occurring a certain time, find out those large IO requester and if possible fine tune them to use less IO. To identify offensive queries, you can use sql trace or extended events.

    http://www.solidq.com/sqj/Pages/Relational/Tracing-Query-Performance-with-Extended-Events.aspx

    http://blogs.msdn.com/b/sql_pfe_blog/archive/2013/03/14/dissecting-sql-server-physical-reads-with-extended-events-and-process-monitor.aspx

    But its recommended a verification over IO subsystem to ensure there is no issue disk drives.


    Srinivasan

    • Marked as answer by chungb Tuesday, July 16, 2013 2:55 AM
    Thursday, July 04, 2013 10:24 AM
  • Hi,

    I/O delays points to fact of poorely written query generating lot of reads and writed which you I/O subsystem is not able to handle.It can also be due to faulty hardware.Can you post result of below query it will give top waits you occured on system>this will let us reach logical conclusion...

    WITH [Waits] AS
        (SELECT
            [wait_type],
            [wait_time_ms] / 1000.0 AS [WaitS],
            ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
            [signal_wait_time_ms] / 1000.0 AS [SignalS],
            [waiting_tasks_count] AS [WaitCount],
            100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],
            ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum]
        FROM sys.dm_os_wait_stats
        WHERE [wait_type] NOT IN (
            N'CLR_SEMAPHORE',    N'LAZYWRITER_SLEEP',
            N'RESOURCE_QUEUE',   N'SQLTRACE_BUFFER_FLUSH',
            N'SLEEP_TASK',       N'SLEEP_SYSTEMTASK',
            N'WAITFOR',          N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
            N'CHECKPOINT_QUEUE', N'REQUEST_FOR_DEADLOCK_SEARCH',
            N'XE_TIMER_EVENT',   N'XE_DISPATCHER_JOIN',
            N'LOGMGR_QUEUE',     N'FT_IFTS_SCHEDULER_IDLE_WAIT',
            N'BROKER_TASK_STOP', N'CLR_MANUAL_EVENT',
            N'CLR_AUTO_EVENT',   N'DISPATCHER_QUEUE_SEMAPHORE',
            N'TRACEWRITE',       N'XE_DISPATCHER_WAIT',
            N'BROKER_TO_FLUSH',  N'BROKER_EVENTHANDLER',
            N'FT_IFTSHC_MUTEX',  N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
            N'DIRTY_PAGE_POLL',  N'SP_SERVER_DIAGNOSTICS_SLEEP')
        )
    SELECT
        [W1].[wait_type] AS [WaitType],
        CAST ([W1].[WaitS] AS DECIMAL(14, 2)) AS [Wait_S],
        CAST ([W1].[ResourceS] AS DECIMAL(14, 2)) AS [Resource_S],
        CAST ([W1].[SignalS] AS DECIMAL(14, 2)) AS [Signal_S],
        [W1].[WaitCount] AS [WaitCount],
        CAST ([W1].[Percentage] AS DECIMAL(4, 2)) AS [Percentage],
        CAST (([W1].[WaitS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgWait_S],
        CAST (([W1].[ResourceS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgRes_S],
        CAST (([W1].[SignalS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgSig_S]
    FROM [Waits] AS [W1]
    INNER JOIN [Waits] AS [W2]
        ON [W2].[RowNum] <= [W1].[RowNum]
    GROUP BY [W1].[RowNum], [W1].[wait_type], [W1].[WaitS],
        [W1].[ResourceS], [W1].[SignalS], [W1].[WaitCount], [W1].[Percentage]
    HAVING SUM ([W2].[Percentage]) - [W1].[Percentage] < 95; -- percentage threshold
    GO


    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

    • Marked as answer by chungb Tuesday, July 16, 2013 2:55 AM
    Thursday, July 04, 2013 10:33 AM
  • Ok,

    I will go thru the link and also post the query result later on, because this only happen at certain time. Thanks guys.

    Friday, July 05, 2013 1:52 AM
  • Hi Shanky

    This is the result for the query

    BACKUPIO 32015.38 31858.67 156.7 282278 30.79 0.1134 0.1129 0.0006
    WRITELOG 14441.94 13777.3 664.64 3069283 13.89 0.0047 0.0045 0.0002
    PAGEIOLATCH_SH 13183.02 12951.2 231.81 2970904 12.68 0.0044 0.0044 0.0001
    ASYNC_NETWORK_IO 9147.28 8011.34 1135.9 12748595 8.8 0.0007 0.0006 0.0001
    BACKUPBUFFER 6382.91 5583.78 799.13 1354555 6.14 0.0047 0.0041 0.0006
    SOS_SCHEDULER_YIELD 5798.81 0.66 5798.2 2639757 5.58 0.0022 0 0.0022
    ASYNC_IO_COMPLETION 4827.13 4826.31 0.81 243 4.64 19.8647 19.8614 0.0033
    BACKUPTHREAD 4638.52 4633.55 4.97 1949 4.46 2.3799 2.3774 0.0025
    RESOURCE_SEMAPHORE_QUERY_COMPILE 2843 2825.56 17.44 547 2.73 5.1974 5.1656 0.0319
    IO_COMPLETION 2538.45 2434.39 104.06 898777 2.44 0.0028 0.0027 0.0001
    OLEDB 2424.78 2424.78 0 1111496 2.33 0.0022 0.0022 0
    LCK_M_X 1697.83 1689.11 8.72 5667 1.63 0.2996 0.2981 0.0015
    Friday, July 05, 2013 7:18 AM
  • With above I assume you are taking backup on a Network drive.Backup operation does not uses the data pages from the buffer pool it will got o disk to backup pages this is  IO intensive operation and it sometimes go to suspended status while waiting for IO. To remove this Please take backup on Local drive and then use tape backup to backup this file on tape.

    Write log wait is accompanied with BACUPIO wait i have seen this al lot this happensd when process is waiting to write log to disk(log file)..I advise u to but your log files on separate drive.

    At this point of time with  data u provided  its wont be correct to say that u have DISK I/O problem


    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

    Friday, July 05, 2013 7:44 AM
  • Hi Shanky,

    Can you explain more in details about your query, and why you think it is related with backup operation?

    Regards,

    Monday, July 08, 2013 1:45 AM
  • During that time, i checked no others backup operation running unless I didnt aware of it. My SQL server avg disc sec /read rise up till 100ms every day at same time.
    Monday, July 08, 2013 7:29 AM
  • Hi Shanky,

    Can you explain more in details about your query, and why you think it is related with backup operation?

    Regards,

    Hi ,

    May be some third party backup tool is taking ur system backup you can go to error log and check for VIRTUAL_DEVICE if it is present in backup message means backup is occuring thru Third party tool on network share

    Also for wait type WRITELOG...it shows there is lot of delay in writing logs on log file drive also ur read write time is high..with PAGEIOLATCH wait i would also recommend to get ur disk checked,disk on which ur trn files are residing...it may be disk/SAN issue .SAN must be updated to latest firmwares to perform better ..

    thanks


    Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers

    • Marked as answer by chungb Tuesday, July 16, 2013 2:55 AM
    Monday, July 08, 2013 2:40 PM