none
High IO RRS feed

  • Question

  • Buffer Cache Hit Ratio

    > 95 -- (Most of the time SQL Server is reading data out of buffer pool)

    Physical

    Disk/Avg Disk Per Sec\Read > 20 --(High Reads from Physical Disk)

    Physical

    Disk/Avg Disk Per Sec\Write > 20 --(High Writes from Physical Disk)

    This

    is happening 10% of the time during busy hours.

    Does this mean that this

    IO is because of the process running outside the buffer pool?
    Tuesday, July 5, 2011 2:12 PM

Answers

  • If the requested data pages are in the buffer pool a physical I/O operation does not occur.  Buffer cache hit ratio should be > 95.  If it is not, there may be a problem elsewhere that is causing SQL Server to move pages out of memory. This is usually in conjunction will a low page life expectency. In your case, this seems perfectly fine.

    Just because your buffer cache hit ratio is > 95 this doesnt mean that all of your data pages are stored in cache and no IO will occur.  When your IO latency increases above 20, look for other process such as backups, table scans (scans per sec counter) etc.. You should be able to identify the reason why your I/O performance decreases throughout the day. In my experience, there are usually queries that make use of table/index scans and/or backups.

     


    http://jahaines.blogspot.com/
    • Proposed as answer by Peja Tao Thursday, July 14, 2011 8:01 AM
    • Marked as answer by Peja Tao Friday, July 15, 2011 12:54 AM
    Sunday, July 10, 2011 9:16 PM

All replies

  • Please share your memory settings and system resources data. this will help isolating issue. Also let us know storage details. 
    -Giri
    Tuesday, July 5, 2011 2:16 PM
  • Buffer Cache Hit Ratio

    > 95-- (Most of the time SQL Server is reading data out of buffer pool)

    The above value should be around approx. 99% or above.  if it falls below that then we need to check the page life expectancy whose value should be always more than 300.

    the troubleshooting is entirely different if there is any memory pressure.


    Physical

    ?
    Disk/AvgDisk Per Sec\Read> 20--(High Reads from Physical Disk)
    Physical
    Disk/AvgDisk Per Sec\Write> 20--(High Writes from Physical Disk)

    The above 2 counters the value should always be below 20 and as less as possible. It actually describes the time take for the read/write from/to the disk.

    Please share the output of SP_configure , the architecture x86/64 . If possible the values of the counters:


    1. Buffer cache hit ratio

    2. Page life expectancy

    3. lazy writes per second

    4. memory grants pending

    6. granted workspace area


    for the disks


    avg. disk queue length , sec/transfer


    all the above counters of the same time (when you see the issue) and the output of the DMV : -  select * from sys.dm_io_pending_io_requests


    Harsh Chawla Personal Blog:-http://blogs.msdn.com/batala Team Blog:- http://blogs.msdn.com/b/sqlserverfaq/
    Tuesday, July 5, 2011 2:53 PM
  • Thanks for the replies!!! Sorry for not making the question clear.

    My question is:

    Buffer Cache Hit Ratio > 95 which means that the process doesnot need to read disk directly. But again I am seeing Disk Read counter more than 20. What does this mean? Does it mean process like Backups and CLRs are causing the Disk IO?

    Tuesday, July 5, 2011 3:39 PM
  • You could select the process counters IO read bytes/s and IO write bytes/s for all the processeses and could check which process is causing high IO.

     


    Harsh Chawla Personal Blog:-http://blogs.msdn.com/batala Team Blog:- http://blogs.msdn.com/b/sqlserverfaq/
    Tuesday, July 5, 2011 3:55 PM
  • Physical Disk\Avg Disk Read/Sec and Buffer Cache Hit Ratio

    Doesnt these counters relate each other? If Buffer Cache Hit Ratio is good then shouldnt Physical Disk\Avg Disk Read/Sec be good ? I am seeing Buffer Hit Ratio with > 95 which I believe is good but Physical Disk\Avg Disk Read/Sec > 20 which I believe is bad.

    Can anybody share why my understanding on the relation of Buffer Cache Hit Ratio and Physical Disk\Avg Disk Read/Sec  is not correct ?

    Thanks,

    Tuesday, July 5, 2011 5:16 PM
  • Doesnt these counters relate each other? If Buffer Cache Hit Ratio is good then shouldnt Physical Disk\Avg Disk Read/Sec be good ? I am seeing Buffer Hit Ratio with > 95 which I believe is good but Physical Disk\Avg Disk Read/Sec > 20 which I believe is bad.

     

    Ans: - when we have buffer cache hit ratio more than 95% , you are right, SQL is able to process most of the requests from the buffer pool itself. But if it goes below 90 or so that means there was internal/external memory pressure inside SQL.  In that case you will see SQL is doing disk activity but that doesn't necessarily mean that it will cause disk contention "all" the time.  In other words, it's not the thumb rule to relate these counters and also there may be some other processes causing heavy disk activity. It may or may not be the cause hence, you need to invetigate few things(or may be more) mentioned in my previous post.

     

    I hope that answers your question.


    Harsh Chawla Personal Blog:-http://blogs.msdn.com/batala Team Blog:- http://blogs.msdn.com/b/sqlserverfaq/
    • Proposed as answer by Peja Tao Thursday, July 7, 2011 3:12 AM
    • Unproposed as answer by nadirsql Sunday, July 10, 2011 3:09 AM
    Tuesday, July 5, 2011 5:37 PM
  • Thanks alot for your reply!!!

    when we have buffer cache hit ratio more than 95% , you are right, SQL is able to process most of the requests from the buffer pool itself. But if it goes below 90 or so that means there was internal/external memory pressure inside SQL.

    -- I thought Buffer Cache Hit Ratio does not give us any measure on external pressure. So, if buffer cache hit ratio is more than 95% it only means that any task running in SQL process is not able to read only from the buffer pool. Here, external pressure doesnt come into play. I would really appreciate if you could point me where I am thinking wrong.

    In that case you will see SQL is doing disk activity but that doesn't necessarily mean that it will cause disk contention "all" the time. In other words, it's not the thumb rule to relate these counters and also there may be some other processes causing heavy disk activity.

    --Agreed, if you looked at my original question I had asked

    "Does this mean that this IO is because of the process running outside the buffer pool?" --I think your answers says "YES". Please let me know if it is "NO".

    It may or may not be the cause hence, you need to invetigate few things(or may be more) mentioned in my previous post.

    --Agreed. Clearly Buffer Cache Hit Ratio is not giving me the information related to external processes, so it cannot give the data related to both external and internal memory pressure.

    Assuming there is no external memory pressure, could buffer cache hit ratio alone can be related to disk read write counter? Or, are there other things I need to consider ?

    Thanks,

    Sunday, July 10, 2011 3:09 AM
  • This is happening 10% of the time during busy hours. 


    If you mean it's like one minute of this every ten minutes, that's probably not a problem.

    If you mean it's like this for a solid hour, then it might need to be looked at.

    Remember, when you ask SQL Server to work hard - sometimes it has to work hard!

    Typically, if you have one or two big tables that have to be scanned, or big ETL processes, you get some minutes (or hours) like this, and that's just how it is supposed to be.

    So without knowing a lot about your workload, there is little point in worrying about system tuning parameters.

    Josh

     

    Sunday, July 10, 2011 3:50 AM
  • @Josh : Thanks alot for your reply.

    So without knowing a lot about your workload, there is little point in worrying about system tuning parameters.

    I was not worrying about it...neither anybody was complaining about it..nor trying to tune it but I am a just trying to learn it.

    If you mean it's like one minute of this every ten minutes, that's probably not a problem.

    If you mean it's like this for a solid hour, then it might need to be looked at.

    Remember, when you ask SQL Server to work hard - sometimes it has to work hard!

    I agree with your statements. But I want to understand why it is occurring even if it is occurring for 1 min out of 10 mins.

    So, My only question is

    Buffer Cache Hit Ratio > 95 -- (Most of the time SQL Server is reading data out of buffer pool)

    Physical

    Disk/Avg Disk Per Sec\Read > 20 --(High Reads from Physical Disk)

    Physical

    Disk/Avg Disk Per Sec\Write > 20 --(High Writes from Physical Disk)

    This

    is happening 10% of the time during busy hours.

    Does this mean that this High

    IO is because of the process running outside the buffer pool?

    Why I am questioning/confused? --To get some verification.

    Buffer Cache Hit Ratio is saying that most often the process doesnt need to access physical disk directly and RAM is sufficient.

    Physical Disk Read Write is saying that the process is hitting it more often.

    "Now, I thought may be Buffer Cache Hit Ratio is talking just about the SQL Internal Processes that occurs inside the buffer pool and Physical Disk Read Write is talking also about the external processes like Backups,CLRs,OS etc. So, this means that processes that are running outside the SQL Server Buffer Pool are causing the HIgh IOs."

    I know I have to lookup many other counters to get detail level of analysis. But I just wanted to get surface level understanding of what can I derive just by looking at Buffer Cache Hit Ratio and Disk Read Write. And definitely from their description they relate each other in some sense if not in total sense.

    Since I have just started learning about SQL Server Performance, I wanted to know from experts if I am thinking in the correct direction.

    Thanks,

    Sunday, July 10, 2011 4:55 AM
  • Nadirsql,

    I think you have it basically correct.

    They way it usually works is that there are a couple of tables that are most frequently referenced by users, and that 90%++ of the references to them are reads.  So as they are read for one query, the pages are kept in buffers anticipating someone else will want to read them again real soon.  If the whole table (and all indexes, etc) can fit in the buffers, then you get a great hit ratio, basically 100%, once everything gets loaded.

    But then, say, someone comes in and needs to scan a different, big table.  Well, suddenly it starts using up buffer space and flushing out some parts of that other table.  Your ratio goes down.

    But, this is OK, really.  The only alternative is having your RAM as big as your entire database!  Actually, these days, that is not uncommon, RAM is so cheap and 64-bit servers can use a lot of it very efficiently.  But then people start wanting bigger databases!  And around and around it goes.

    So, as long as the hit ratio stays above 95% (or 98% or 99% or whatever you think you need) and the seconds-per-IO on reads and writes stays down under 30ms (or about 30 combined reads/writes per second), things are going just about as well as you can hope for.

    But if the users say it still isn't fast enough?  Well, then it starts getting challenging!  But anyway, those are the basics.

    Josh

     

    Sunday, July 10, 2011 5:03 PM
  • If the requested data pages are in the buffer pool a physical I/O operation does not occur.  Buffer cache hit ratio should be > 95.  If it is not, there may be a problem elsewhere that is causing SQL Server to move pages out of memory. This is usually in conjunction will a low page life expectency. In your case, this seems perfectly fine.

    Just because your buffer cache hit ratio is > 95 this doesnt mean that all of your data pages are stored in cache and no IO will occur.  When your IO latency increases above 20, look for other process such as backups, table scans (scans per sec counter) etc.. You should be able to identify the reason why your I/O performance decreases throughout the day. In my experience, there are usually queries that make use of table/index scans and/or backups.

     


    http://jahaines.blogspot.com/
    • Proposed as answer by Peja Tao Thursday, July 14, 2011 8:01 AM
    • Marked as answer by Peja Tao Friday, July 15, 2011 12:54 AM
    Sunday, July 10, 2011 9:16 PM