Question about the output of sys.dm_io_virtual_file_stats and IO stalls

Unanswered Question about the output of sys.dm_io_virtual_file_stats and IO stalls

  • Friday, August 17, 2012 9:28 PM
     
     

    I have a SQL server running 2008R2. I am looking at sys.dm_io_virtual_file_stats and trying to make sense of the results. The server in question was restarted last night so the statistics are from the last 24 hours. Because of this fact, these results are really concerning. The file in question is the data file (mdf) from one of the databases on the server, the DB is 70Gb in size. The server has 16GB RAM and 12GB allocated to SQL. I run the following query and get an unusual amount of stalls:

    WITH IOSELECT AS

    (
    SELECT 
    DB_NAME(vfs.database_id) AS [Database Name]
    ,vfs.file_id
    ,smf.physical_name AS [File Path/Name]
    ,sample_ms / 60000 AS [Number of minutes since server restart]
    ,io_stall_read_ms / 60000 AS [Read Stalls in Minutes]
    ,CAST(io_stall_read_ms/(1.0 + num_of_reads) AS NUMERIC(10,1)) AS [Average Read Stalls in Milliseconds]
    ,io_stall_write_ms / 60000 AS [Write Stalls in Minutes]
    ,CAST(io_stall_write_ms/(1.0+num_of_writes) AS NUMERIC(10,1)) AS [Average Write Stalls in Milliseconds]
    ,io_stall / 60000 AS [Total Stalls in Minutes]
    ,CAST((io_stall_read_ms + io_stall_write_ms)/(1.0 + num_of_reads + num_of_writes) AS NUMERIC(10,1)) AS [Average IO Stalls in Milliseconds]
    FROM sys.dm_io_virtual_file_stats(NULL, NULL) vfs
    INNER JOIN sys.master_files smf ON smf.database_id = vfs.database_id
    AND smf.file_id = vfs.file_id
    )
    SELECT 
    *
    ,CAST(100. * [Total Stalls in Minutes] / SUM([Total Stalls in Minutes]) OVER() AS DECIMAL(10, 2)) AS [IO Stall %]
    ,[Number of minutes since server restart] - [Total Stalls in Minutes] AS TimeNotStalling
    FROM IOSELECT
    ORDER BY [Total Stalls in Minutes] DESC

    ------------------------- Results:

    Time since last restart = 1180 Minutes, Read Stalls in minutes = 1106, Average Read Stall in Millisecond = 14.3, Write Stalls in Minutes = 291, Average Write Stalls in Milliseconds = 19.7, Total Stalls in Minutes = 1397 - how???

    The data file is on a RAID 1/0 with a total of 6 disks, on our SAN.

    Am I doing something wrong in my query, am I misreading the results--so confused. Any help or insight would be greatly appreciated.

    Thanks,

All Replies

  • Monday, August 20, 2012 1:50 AM
    Answerer
     
     

    This is perfectly normal - you're seeing a situation where it appears that SQL Server has been waiting on disk I/O for longer than it's been up.

    This is due to the fact that while the server may have been up for 1180 minutes of "real" time, you are running many processes inside SQL Server, which have collectively been waiting for 1397 minutes. Imagine if you had 10 processes trying to read from your RAID array, but it's unavailable for 2 seconds. That's 2 seconds of real time, but 10 processes each waiting 2 seconds is 20 seconds total stall.

    This means it doesn't really make sense to compare the up-time with the stall time - you're better off comparing how that stall time changes over time (e.g., every 10 minutes, are we seeing more or less stalls?).

    With your average read/write speeds (again, preferably you'd do this over shorter intervals than "since startup", and you'd want to see maximum values) the disk I/O looks pretty snappy.

  • Monday, August 20, 2012 2:40 PM
     
     

    That makes sense--thank you for your reply.

    That fact aside, these numbers don't appear to be unusual then? To be fair, 1TB was read from the database during this period, but still, the average millisecond stalls seem high.

    Thanks.