Answered by:
SQL 2005 event log

-
Hi,
My SQL 2005 server event log appear to have one issue with IO, " SQL Server has encountered 7 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file ... " .
It happen only to one drive at certain time, this drive contains my sharepoint database and also tempdb file. The average disc read percentage hit max value each time .
I am confuse to identify the root cause for it ( tempdb ? IO bottleneck ? memory bottleneck? ) . I need someone that can guide me the step to troubleshoot and solve this problem. Thanks
Regards,
Chungb
Question
Answers
-
Either your disk is flooded with Large IO requests (high number of DML, Large scan due to improper/missing index, poor table design) or there could be hardware problem.
As you say its occurring a certain time, find out those large IO requester and if possible fine tune them to use less IO. To identify offensive queries, you can use sql trace or extended events.
http://www.solidq.com/sqj/Pages/Relational/Tracing-Query-Performance-with-Extended-Events.aspx
http://blogs.msdn.com/b/sql_pfe_blog/archive/2013/03/14/dissecting-sql-server-physical-reads-with-extended-events-and-process-monitor.aspx
But its recommended a verification over IO subsystem to ensure there is no issue disk drives.
Srinivasan
- Marked as answer by chungb Tuesday, July 16, 2013 2:55 AM
-
Hi,
I/O delays points to fact of poorely written query generating lot of reads and writed which you I/O subsystem is not able to handle.It can also be due to faulty hardware.Can you post result of below query it will give top waits you occured on system>this will let us reach logical conclusion...
WITH [Waits] AS (SELECT [wait_type], [wait_time_ms] / 1000.0 AS [WaitS], ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS], [signal_wait_time_ms] / 1000.0 AS [SignalS], [waiting_tasks_count] AS [WaitCount], 100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage], ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum] FROM sys.dm_os_wait_stats WHERE [wait_type] NOT IN ( N'CLR_SEMAPHORE', N'LAZYWRITER_SLEEP', N'RESOURCE_QUEUE', N'SQLTRACE_BUFFER_FLUSH', N'SLEEP_TASK', N'SLEEP_SYSTEMTASK', N'WAITFOR', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION', N'CHECKPOINT_QUEUE', N'REQUEST_FOR_DEADLOCK_SEARCH', N'XE_TIMER_EVENT', N'XE_DISPATCHER_JOIN', N'LOGMGR_QUEUE', N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'BROKER_TASK_STOP', N'CLR_MANUAL_EVENT', N'CLR_AUTO_EVENT', N'DISPATCHER_QUEUE_SEMAPHORE', N'TRACEWRITE', N'XE_DISPATCHER_WAIT', N'BROKER_TO_FLUSH', N'BROKER_EVENTHANDLER', N'FT_IFTSHC_MUTEX', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', N'DIRTY_PAGE_POLL', N'SP_SERVER_DIAGNOSTICS_SLEEP') ) SELECT [W1].[wait_type] AS [WaitType], CAST ([W1].[WaitS] AS DECIMAL(14, 2)) AS [Wait_S], CAST ([W1].[ResourceS] AS DECIMAL(14, 2)) AS [Resource_S], CAST ([W1].[SignalS] AS DECIMAL(14, 2)) AS [Signal_S], [W1].[WaitCount] AS [WaitCount], CAST ([W1].[Percentage] AS DECIMAL(4, 2)) AS [Percentage], CAST (([W1].[WaitS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgWait_S], CAST (([W1].[ResourceS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgRes_S], CAST (([W1].[SignalS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgSig_S] FROM [Waits] AS [W1] INNER JOIN [Waits] AS [W2] ON [W2].[RowNum] <= [W1].[RowNum] GROUP BY [W1].[RowNum], [W1].[wait_type], [W1].[WaitS], [W1].[ResourceS], [W1].[SignalS], [W1].[WaitCount], [W1].[Percentage] HAVING SUM ([W2].[Percentage]) - [W1].[Percentage] < 95; -- percentage threshold GO
Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers
- Marked as answer by chungb Tuesday, July 16, 2013 2:55 AM
-
Hi Shanky,
Can you explain more in details about your query, and why you think it is related with backup operation?
Regards,
Hi ,
May be some third party backup tool is taking ur system backup you can go to error log and check for VIRTUAL_DEVICE if it is present in backup message means backup is occuring thru Third party tool on network share
Also for wait type WRITELOG...it shows there is lot of delay in writing logs on log file drive also ur read write time is high..with PAGEIOLATCH wait i would also recommend to get ur disk checked,disk on which ur trn files are residing...it may be disk/SAN issue .SAN must be updated to latest firmwares to perform better ..
thanks
Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers
- Marked as answer by chungb Tuesday, July 16, 2013 2:55 AM
All replies
-
Either your disk is flooded with Large IO requests (high number of DML, Large scan due to improper/missing index, poor table design) or there could be hardware problem.
As you say its occurring a certain time, find out those large IO requester and if possible fine tune them to use less IO. To identify offensive queries, you can use sql trace or extended events.
http://www.solidq.com/sqj/Pages/Relational/Tracing-Query-Performance-with-Extended-Events.aspx
http://blogs.msdn.com/b/sql_pfe_blog/archive/2013/03/14/dissecting-sql-server-physical-reads-with-extended-events-and-process-monitor.aspx
But its recommended a verification over IO subsystem to ensure there is no issue disk drives.
Srinivasan
- Marked as answer by chungb Tuesday, July 16, 2013 2:55 AM
-
Hi,
I/O delays points to fact of poorely written query generating lot of reads and writed which you I/O subsystem is not able to handle.It can also be due to faulty hardware.Can you post result of below query it will give top waits you occured on system>this will let us reach logical conclusion...
WITH [Waits] AS (SELECT [wait_type], [wait_time_ms] / 1000.0 AS [WaitS], ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS], [signal_wait_time_ms] / 1000.0 AS [SignalS], [waiting_tasks_count] AS [WaitCount], 100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage], ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum] FROM sys.dm_os_wait_stats WHERE [wait_type] NOT IN ( N'CLR_SEMAPHORE', N'LAZYWRITER_SLEEP', N'RESOURCE_QUEUE', N'SQLTRACE_BUFFER_FLUSH', N'SLEEP_TASK', N'SLEEP_SYSTEMTASK', N'WAITFOR', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION', N'CHECKPOINT_QUEUE', N'REQUEST_FOR_DEADLOCK_SEARCH', N'XE_TIMER_EVENT', N'XE_DISPATCHER_JOIN', N'LOGMGR_QUEUE', N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'BROKER_TASK_STOP', N'CLR_MANUAL_EVENT', N'CLR_AUTO_EVENT', N'DISPATCHER_QUEUE_SEMAPHORE', N'TRACEWRITE', N'XE_DISPATCHER_WAIT', N'BROKER_TO_FLUSH', N'BROKER_EVENTHANDLER', N'FT_IFTSHC_MUTEX', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', N'DIRTY_PAGE_POLL', N'SP_SERVER_DIAGNOSTICS_SLEEP') ) SELECT [W1].[wait_type] AS [WaitType], CAST ([W1].[WaitS] AS DECIMAL(14, 2)) AS [Wait_S], CAST ([W1].[ResourceS] AS DECIMAL(14, 2)) AS [Resource_S], CAST ([W1].[SignalS] AS DECIMAL(14, 2)) AS [Signal_S], [W1].[WaitCount] AS [WaitCount], CAST ([W1].[Percentage] AS DECIMAL(4, 2)) AS [Percentage], CAST (([W1].[WaitS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgWait_S], CAST (([W1].[ResourceS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgRes_S], CAST (([W1].[SignalS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgSig_S] FROM [Waits] AS [W1] INNER JOIN [Waits] AS [W2] ON [W2].[RowNum] <= [W1].[RowNum] GROUP BY [W1].[RowNum], [W1].[wait_type], [W1].[WaitS], [W1].[ResourceS], [W1].[SignalS], [W1].[WaitCount], [W1].[Percentage] HAVING SUM ([W2].[Percentage]) - [W1].[Percentage] < 95; -- percentage threshold GO
Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers
- Marked as answer by chungb Tuesday, July 16, 2013 2:55 AM
-
-
Hi Shanky
This is the result for the query
BACKUPIO 32015.38 31858.67 156.7 282278 30.79 0.1134 0.1129 0.0006 WRITELOG 14441.94 13777.3 664.64 3069283 13.89 0.0047 0.0045 0.0002 PAGEIOLATCH_SH 13183.02 12951.2 231.81 2970904 12.68 0.0044 0.0044 0.0001 ASYNC_NETWORK_IO 9147.28 8011.34 1135.9 12748595 8.8 0.0007 0.0006 0.0001 BACKUPBUFFER 6382.91 5583.78 799.13 1354555 6.14 0.0047 0.0041 0.0006 SOS_SCHEDULER_YIELD 5798.81 0.66 5798.2 2639757 5.58 0.0022 0 0.0022 ASYNC_IO_COMPLETION 4827.13 4826.31 0.81 243 4.64 19.8647 19.8614 0.0033 BACKUPTHREAD 4638.52 4633.55 4.97 1949 4.46 2.3799 2.3774 0.0025 RESOURCE_SEMAPHORE_QUERY_COMPILE 2843 2825.56 17.44 547 2.73 5.1974 5.1656 0.0319 IO_COMPLETION 2538.45 2434.39 104.06 898777 2.44 0.0028 0.0027 0.0001 OLEDB 2424.78 2424.78 0 1111496 2.33 0.0022 0.0022 0 LCK_M_X 1697.83 1689.11 8.72 5667 1.63 0.2996 0.2981 0.0015 -
With above I assume you are taking backup on a Network drive.Backup operation does not uses the data pages from the buffer pool it will got o disk to backup pages this is IO intensive operation and it sometimes go to suspended status while waiting for IO. To remove this Please take backup on Local drive and then use tape backup to backup this file on tape.
Write log wait is accompanied with BACUPIO wait i have seen this al lot this happensd when process is waiting to write log to disk(log file)..I advise u to but your log files on separate drive.
At this point of time with data u provided its wont be correct to say that u have DISK I/O problem
Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers
-
-
-
Hi Shanky,
Can you explain more in details about your query, and why you think it is related with backup operation?
Regards,
Hi ,
May be some third party backup tool is taking ur system backup you can go to error log and check for VIRTUAL_DEVICE if it is present in backup message means backup is occuring thru Third party tool on network share
Also for wait type WRITELOG...it shows there is lot of delay in writing logs on log file drive also ur read write time is high..with PAGEIOLATCH wait i would also recommend to get ur disk checked,disk on which ur trn files are residing...it may be disk/SAN issue .SAN must be updated to latest firmwares to perform better ..
thanks
Please mark this reply as the answer or vote as helpful, as appropriate, to make it useful for other readers
- Marked as answer by chungb Tuesday, July 16, 2013 2:55 AM