none
IO Latency meaning RRS feed

  • Question

  • I am trying to figure out why my stalls are very high in comparison with my reads.

    based on the documentation of sys.dm_io_virtual_file_stats (https://msdn.microsoft.com/en-CA/library/ms190326.aspx)

    num_of_reads = "Number of reads issued on the file."

    io_stall_read_ms = "Total time, in milliseconds, that the users waited for reads issued on the file."

    What confuses me is the "Users" part which is only present in the stalls definition but not in the reads.

    Are the reads correlating with the stalls?  i.e. Does it represents reads which were not satisfied by the buffer cache. Or do they include read aheads and such reads which are in the background and not impacting to the user or contributing to waits.

    For the stall, Am I right in assuming that values computed in this stat are only reads which induced a wait to the user query. (PAGEIOLATCH and IO_COMPLETION)

    Monday, May 9, 2016 9:52 PM

Answers

All replies

  • >Are the reads correlating with the stalls?

    If the page is in the page cache, it's a "Page Lookup" not a "Read".  So it wouldn't be counted at all.  So every Read will add to the io_stall_read_ms

    >Or do they include read aheads and such reads

    Yes.  But I don't consider read-ahead reads as background IO.  It's just the storage engine trying to keep ahead of a query plan, but it doesn't always succeed.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Monday, May 9, 2016 10:02 PM
  • Antoine,

    What kind of numbers are you seeing?

    Have you correlated them with numbers from profiler traces?

    I've been having some issues with these recently and have been trying to locate a cause.

    My wait stats are negligible, and I've become very suspicious of the VM platform.

    Thanks,

    Josh

    Monday, May 9, 2016 10:34 PM
  • Thanks for the clarification David.

    Josh, I am not too sure how I would extract that info from an extended event session.

    If I look a the overall, over a duration of 30 minutes, am seeing:

    • total reads: 1691162
    • read io stalls: 450854705 ms
    • latency (stalls / reads): 267 ms

    If I look at the sum of the wait times IO (pageiolatchs and iocompletion):

    PAGEIOLATCH_SH 42247347
    PAGEIOLATCH_UP 51076
    PAGEIOLATCH_EX 2847252
    IO_COMPLETION 295324

    which sums up to one tenth of what I see in the stalls, 45440999 ms

    those numbers come from sys.dm_io_virtual_file_stats and sys.dm_os_wait_stats snapshots taken 30 minutes apart.

    Wednesday, May 11, 2016 8:50 PM
  • Antoine,

    Yes, overall that's the same kind of thing that I see.

    And I have no idea what to make of it.

    I assume that every IO stall must correspond to some wait state or other.

    Are you on a VM?

    Josh

    Thursday, May 12, 2016 6:30 AM
  • Thanks Josh.

    It is acutally the point of the post.  I see latency which defies all guidelines. But I cannot know what the actual impact on the client in term of waits.

    Is there an actual correlation or does the stall contains data which would not be compiled in the waits?


    • Edited by Antoine F Thursday, May 12, 2016 2:29 PM
    Thursday, May 12, 2016 2:28 PM
  • >Is there an actual correlation or does the stall contains data which would not be compiled in the waits?

    You could have more sessions waiting for data than there are outstanding read IOs.

    You can have read IOs that do not correspond to user waits, like backups

    eg

    select * from sys.dm_io_virtual_file_stats(db_id(),null)
    declare @sql nvarchar(max) = 'backup database ['+db_name() +'] to disk=''c:\temp\foo.bak'' with init'
    exec (@sql)
    select * from sys.dm_io_virtual_file_stats(db_id(),null)

    David


    David http://blogs.msdn.com/b/dbrowne/


    Thursday, May 12, 2016 3:38 PM
  • >Is there an actual correlation or does the stall contains data which would not be compiled in the waits?

    You could have more sessions waiting for data than there are outstanding read IOs.

    ...

    OK, fair enough... backups were easy to test.

    For the correlation of waits vs stalls However, if more then one sessions are waiting for the same IO, wouldn't it results in the waits be higher then the stalls?  When I list the waits related to io completion and pageiolatch, I only get one tenth of the waits. I added the BACKUPIO, IO_AUDIT_MUTEX, IO_RETRY, IOAFF_RANGE_QUEUE and still get to one tenth.

    What I am trying to figure is what is accounted in the stalls when msdn the doc writes "user waited" and if or how it can be correllerated with user experience.

    Thursday, May 12, 2016 7:21 PM
  • Josh, I forgot to answer your question.  it is a physical server with multiple instances.
    Thursday, May 12, 2016 7:21 PM
  • Are you sure the measurements are correct?  That would 704min of PAGEIOLATCH_SH in 30min of elapsed time.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Thursday, May 12, 2016 8:24 PM
  • at 10:00 AM, sys.dm_os_wait_stats.max_wait_time_ms for PAGEIOLATCH_SH = 8290214011

    at 10:30 AM, sys.dm_os_wait_stats.max_wait_time_ms for PAGEIOLATCH_SH = 8332461358

    8332461358 - 8290214011 = 42,247,347 ms

    My read stalls are 10 times that... (450,854,705 ms)

    Thursday, May 12, 2016 9:30 PM
  • Say those numbers are non realistic. Could you suggest an alternate method or tool to ensure their validity?
    • Edited by Antoine F Thursday, May 12, 2016 9:56 PM
    Thursday, May 12, 2016 9:55 PM
  • Josh, I forgot to answer your question.  it is a physical server with multiple instances.

    Well, I'll have to battle on with my VM suspicions without you then. :)

    Thanks,

    Josh

    Thursday, May 12, 2016 10:56 PM
  • Are you sure the measurements are correct?  That would 704min of PAGEIOLATCH_SH in 30min of elapsed time.

    David,

    Is that an issue?

    On a busy system with 500 or 1,000 threads, if they're all latchy, would that explain such numbers?

    I never really asked before, but if so, then even much smaller wait totals, divided by sessions/threads, would be less intense, too.

    Thanks,

    Josh

    Thursday, May 12, 2016 10:59 PM
  • >On a busy system with 500 or 1,000 threads, if they're all latchy, would that explain such numbers?

    Yes.  It's possible, but it means that on average there are 21 sessions waiting on database file IO.

    >sys.dm_os_wait_stats.max_wait_time_ms for PAGEIOLATCH_SH = 8332461358

    >Say those numbers are non realistic. Could you suggest an alternate method or tool to ensure their validity

    Try this query

    My Favorite Query For Investigating SQL Server Performance

    David


    David http://blogs.msdn.com/b/dbrowne/

    Friday, May 13, 2016 2:57 PM
  • Try this query

    My Favorite Query For Investigating SQL Server Performance

    My favorite query for SQL Server performance is a summary of trace logs.

    I have two separate scripts I've developed like the two halves of yours, I call one spidspy and the other waitspy.  Seems like a good idea to combine them.

    Josh

    Friday, May 13, 2016 7:19 PM
  • Thanks a bunch David, I love those queries.

    Currently, I have lesser values then yesterday.  I will run it again on the same timeframe a different day.

    Friday, May 13, 2016 7:28 PM