none
Page Life Expectancy at 30 seconds best practice is 300 seconds.

    Question

  • I have a SQL instance which is running SQL 2005 on windows 2003 x64, 64GB of RAM on the server however the page life expectancy can sometimes dangerously drop to less than a minute and buffer cache ratio hits 99% but sometimes although rare drops to 70%. I noticed that the max server memory has been set to 60GB, but the lock pages in memory permission wasnt set for the SQL domain service account. I would like to understand what can be causing this, I was also looking at the possibility that data pages might be filling up the buffer cache, but need to investigate and have hard facts.

    Any ideas

    Monday, September 19, 2011 12:59 PM

All replies

  • http://technet.microsoft.com/en-us/library/cc966540.aspx#EGAA

    How much memory does the server have? What does the below query return?

     

    --sql server uses the memory

    select 

    CONVERT(VARCHAR,CAST(bpool_committed *8  AS MONEY),1)AS [SIZE],

    bpool_committed,bpool_commit_target 

      from 

    sys.dm_os_sys_info

     


    Best Regards, Uri Dimant SQL Server MVP http://dimantdatabasesolutions.blogspot.com/ http://sqlblog.com/blogs/uri_dimant/
    Monday, September 19, 2011 1:07 PM
    Answerer
  • You can also see what objects have pages in the cache with the following query:

    exec SP_msforeachdb 'USE[?]; 
    select t.name, 
    bd.* from sys.dm_os_buffer_descriptors bd
    inner join sys.allocation_units a on bd.allocation_unit_id = a.allocation_unit_id
    inner join sys.partitions p on a.container_id = p.partition_id
    inner join sys.tables t on p.object_id  = t.object_id'
    

     


    Peter Carter http://sqlserverdownanddirty.blogspot.com/
    Monday, September 19, 2011 1:16 PM
  • Also, just remembered, there is a pretty cool Paul Randle article on this as well...

    http://www.sqlskills.com/BLOGS/PAUL/post/Performance-issues-from-wasted-buffer-pool-memory.aspx


    Peter Carter http://sqlserverdownanddirty.blogspot.com/
    Monday, September 19, 2011 1:18 PM
  • Hello,

    The first thing that strikes out at me is that SQL Server is given 60 of 64 total GB of RAM! At 64 GB I would give a little more back to the OS so it doesn't get hit with memory pressure.

    If you want to investigate how each DB is using the memory:

    SELECT DB_NAME(database_id) AS [Database]
    		,(COUNT(*) * 8)/1024 AS [MB_Per_DB] 
    		, COUNT(*) AS [Number_Of_Pages]
    		, page_type
    FROM sys.dm_os_buffer_descriptors 
    WHERE database_id <> 32767 --Resource_DB
    AND (page_type = 'INDEX_PAGE'
    OR page_type = 'DATA_PAGE')
    GROUP BY database_id, page_type
    ORDER BY database_id, page_type
    

    This may give you a better idea what is going on. If your PLE drops that means pages are being pushed out of memory for different ones, so either you don't have enough cache or pages that are rarely used are being read (think about a large one time summation of all orders placed > 1 year ago. Rarely if ever checked but a great deal of data could exist). The Cache hit ratio dropping is linked to the PLE, but coudl be caused by recompiles, check your recompiles/sec counter and see if your SPs are being bad, or perhaps someone is doing ad hoc queries from Access (yikes!).

    First thing I would check it to make sure no one is doing weird ad hoc reporting/querying. Then move on to checking for long running job or processes, like archiving or year to date sales reporting/building/refreshing. If none of those can explain the issues, try checking for the recompiles and, memory: available Mbytes (memory pressure in the OS), SQL Server: Access methods -> full scans/sec (good indexing?), same category page splits/sec (many page splits casuing memory problems).

    Does this happen around the same time (when it occurs)?

    -Sean

    Monday, September 19, 2011 1:22 PM
    Answerer
  • Depending on your database/server usage this is not "abnormal".  You need to look at the server as an entire entity, not specific counters.

    If someone does a SELECT * FROM tablea and tablea has 1TB of data, it will fill the buffer pool with tablea.  Then the next query will start removing data from the buffer pool and putting back other data.

    Also, if you refresh the data every night, you will see the initial impact of the data being moved into the buffer pool.

    Are you having a performance issue?

     

    Monday, September 19, 2011 2:02 PM
    Moderator
  • Hi,

    Just thought I'd chime in - 300 seconds for PLE is a really old recommendation.  Things have moved on a bit since those days - we constantly see > 15000 seconds on our servers, and if PLE dipped down to 500, which is still above the recommended, I'd be worried.  There's no fixed limit - it depends on your environment and your settings.

    http://www.sqlmag.com/blog/sql-server-questions-answered-28/sql-server/what-does-page-life-expectancy-mean-137153

     



    Thanks,

    Andrew Bainbridge
    SQL Server DBA

    Please click "Propose As Answer" if a post solves your problem, or "Vote As Helpful" if a post has been useful to you
    Monday, September 19, 2011 2:56 PM
  • Just want to second what Tom said, what you want is an *average* of 300 or so, there may be frequent spikes down to 30 or less, no problem.

    Josh

     

    Monday, September 19, 2011 11:34 PM
  • One thing setting max memory for sql server 60GB is not ideal for 64bit.  You should have configured with 54-56GB leaving remainings to OS and other SQL Server externals.

    http://sqlserverperformance.wordpress.com/2008/08/06/suggested-max-memory-settings-for-sql-server-20052008-2/

    Please check the followings.

    1.  Batch requests \sec

    2. Freepages

    3. Checkpoints\sec

    4. Lazywriter\sec

    if there is a memory pressure. Either it will be external or internal. if there is a external memory pressure that means os is not having sufficient amount of free memory to os.

    if there is internal.  you need to identify the queries like having group by oder by clause and also check if there are any missing indexes. Fixing the missing indexes improves the performance by not pulling the unncessary data into memory.

    Checkpoints\sec and Lazywriters will be working aggressively if there is memory pressure. specially lazywriter\sec will be working hard to keep the freespages to some optimal level if it goes below certain level.

     


    VEERESH V NASHI.
    Tuesday, September 20, 2011 4:45 AM
  • I have looked in the buffer pool and its mainly filled with a non clusterd index taking up about 40GB of the size of the buffer pool, Is there a way to see why the index needs to be kept in the buffer pool and what is actually causing this behaviour ?

    Thanks in advance.

    Tuesday, September 20, 2011 10:48 AM
  • On a 64bit box, excessive paging can occur if 'Lock Pages in Memory' privilege is not assigned to the SQL Server service startup account. Also, as Sean mentioned, consider setting Max Server Memory to around 80 % of RAM on a 64 bit box. That would be around 52 GB.
    Tuesday, September 20, 2011 8:49 PM
  • I have a SQL instance which is running SQL 2005 on windows 2003 x64, 64GB of RAM on the server however the page life expectancy can sometimes dangerously drop to less than a minute and buffer cache ratio hits 99% but sometimes although rare drops to 70%. I noticed that the max server memory has been set to 60GB, but the lock pages in memory permission wasnt set for the SQL domain service account. I would like to understand what can be causing this, I was also looking at the possibility that data pages might be filling up the buffer cache, but need to investigate and have hard facts.

    Any ideas

    First things first, if you have 60GB of memory in SQL Server, and you are churning through it every 5 minutes, that is a lot of excess disk I/O you are generating on the system.  The 300 value for PLE dates back to SQL 2000 where 4 or 8 GB of RAM was a lot of memory.  If you read 4GB of data from disk into cache every 5 minutes that is one thing, if you are reading 60GB of data from disk into cache every 5 minutes that is a completely different thing. Your disk I/O subsystem may be able to keep up with sustained 12GB/minute activity without problems, but I would really be working on tuning the problem to minimize buffer pool churn, or be looking at increasing memory if I had a high number of Lazy Writes/sec, and associated Free List Stalls/sec on the instance.

    Second thing is with regards to Buffer Cache Hit Ratio, the read ahead mechanisms in SQL generally stay well ahead of the pages being used during query processing, so the old adage that Buffer Cache Hit Ratio matters is really out dated.  I haven't seen a server in years that had a Buffer Cache Hit Ratio that was below 99% for more than a 1-2 second drop associated with a really large query performing an index or table scan.  If you see this consistently under 95% on a server today, you had memory trouble way before this counter began to show it, I can promise that.

    Finally, give me a few and I'll publish some code to help you with identifying the reason why this index is being so heavily used in cache.

    EDIT:

    Here is the code that you can use to scan the plan cache for statements that use a specific index:

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
    DECLARE @IndexName AS NVARCHAR(128) = 'PK__TestTabl__FFEE74517ABC33CD';

    -- Make sure the name passed is appropriately quoted
    IF (LEFT(@IndexName, 1) <> '[' AND RIGHT(@IndexName, 1) <> ']') SET @IndexName = QUOTENAME(@IndexName);
    --Handle the case where the left or right was quoted manually but not the opposite side
    IF LEFT(@IndexName, 1) <> '[' SET @IndexName = '['+@IndexName;
    IF RIGHT(@IndexName, 1) <> ']' SET @IndexName = @IndexName + ']';

    -- Dig into the plan cache and find all plans using this index
    ;WITH XMLNAMESPACES
       (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')   
    SELECT
     stmt.value('(@StatementText)[1]', 'varchar(max)') AS SQL_Text,
     obj.value('(@Database)[1]', 'varchar(128)') AS DatabaseName,
     obj.value('(@Schema)[1]', 'varchar(128)') AS SchemaName,
     obj.value('(@Table)[1]', 'varchar(128)') AS TableName,
     obj.value('(@Index)[1]', 'varchar(128)') AS IndexName,
     obj.value('(@IndexKind)[1]', 'varchar(128)') AS IndexKind,
     cp.plan_handle,
     query_plan
    FROM sys.dm_exec_cached_plans AS cp
    CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
    CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt)
    CROSS APPLY stmt.nodes('.//IndexScan/Object[@Index=sql:variable("@IndexName")]') AS idx(obj)
    OPTION(MAXDOP 1, RECOMPILE);


    Now I have been asked this before so I will go ahead and defend it ahead of time.  I use transaction isolation level read uncommitted when querying the plan cache just to make sure I don’t cause any problems because this can be a long running query depending on the size of the plan cache and I don’t care if I get some phantom plan as a result of being read uncommitted.  I also use MAXDOP 1 for the query because it tends to be a CPU resource hog, and a lot of times I am running this on a production system or telling someone to run this on a production system and it has the potential for impacting performance.  If your server is considerably beefy hardware wise, it probably won’t matter, and for most people it doesn’t negatively impact their environment, but like anything else, I apply a rule of “First do no harm” to something that might be run against a production system.

     


    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!


    Wednesday, September 21, 2011 2:00 AM
    Moderator
  • Thanks Jonathan for the query, that was very useful. I ran the query on the database and it pulled out about 20 statements that use this index one way or the other. some of the query plans refer to ad-hoc SQL queries, the queries look thesame its just the parameters which are different. 

    The SQL version is SQL 2005, what strikes me is that the queries are different but yet cached based on the parameters. I will look at what makes these calls and see if I can write a stored procedure to peform same and passing params to it. The index is mainly used within a key lookup, so perhaps a new index is what is required here. PLease confirm if this approach is the way forward, I'm thinking if a new index is created to help with these queries, the use of this particular index wouldnt be required again and it doesnt need to be loaded onto the cache.

     

     

    Wednesday, September 21, 2011 2:29 PM
  • The environment is a huge datawarehouse server, and by the looks of things, the server needs to have more physical memory. One approach is to either get more memory or I'm thinking, query the buffer cache from time to time and once its full, run the dbcc dropcleanbuffers command.

    Will this approach work ?

    Thursday, September 22, 2011 12:41 PM
  • The environment is a huge datawarehouse server, and by the looks of things, the server needs to have more physical memory. One approach is to either get more memory or I'm thinking, query the buffer cache from time to time and once its full, run the dbcc dropcleanbuffers command.

    Will this approach work ?

    No, that would be one of the worst things you could do.  DBCC DROPCLEANBUFFERS clears all the data from cache, so you are forcing it to have to pull it back off disk again.  The buffer pool manages itself through the Least Recently Used (LRU) algorithms in place and the clock hand sweeps of the caches.  If you can't add memory to the system, and you can't tune your queries, database design, and indexing to reduce the data going through cache then leave it alone.  It is what it is.

    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!

    Thursday, September 22, 2011 1:18 PM
    Moderator
  • ok I thought the dropcleanbuffers approach was quite brutalistic. In terms of tuning the query, the solution in most cases involves the creation of covering indexes, my thinking is that if you create a covering index for the columns with such a large data set, wouldnt it be leading to same problem again as that index may need to be loaded onto the cache again.

     

    Thursday, September 22, 2011 1:40 PM
  • I have a SQL instance which is running SQL 2005 on windows 2003 x64, 64GB of RAM on the server however the page life expectancy can sometimes dangerously drop to less than a minute and buffer cache ratio hits 99% but sometimes although rare drops to 70%. I noticed that the max server memory has been set to 60GB, but the lock pages in memory permission wasnt set for the SQL domain service account. I would like to understand what can be causing this, I was also looking at the possibility that data pages might be filling up the buffer cache, but need to investigate and have hard facts.

    Any ideas


    To understand why Buffer Cache Hit Ratio remains high in this scenario, see my article:

    Great SQL Server Debates: Buffer Cache Hit Ratio

    This counter provides information about the ability of the read ahead mechanism in SQL Server to read pages from disk into cache before the page is actually required by the query processor.  It doesn't mean that the page was originally in cache at the time the query started, it just means that the page was available in cache at the point it was needed, it could have been placed in cache by a recent prefetch in the storage engine.  It is not a good indicator of memory pressure or whether or not pages were in the buffer cache when a query began executing in most cases.


    Jonathan Kehayias | Principal Consultant, SQLSkills.com
    SQL Server MVP | Microsoft Certified Master: SQL Server 2008
    Author of Troubleshooting SQL Server: A Guide for Accidental DBAs
    Feel free to contact me through My Blog or Twitter. Become a SQLskills Insider!
    Please click the Mark as Answer button if a post solves your problem!

    Saturday, January 28, 2012 7:06 AM
    Moderator
  • Since Jon has already taken charge of this , I wanted to keep myself away from it :) .But I would like to add just one point : Take the 24 hour snapshot of page life expectancy and page file usage and make a graph ,Notice if there is a sudden fall at some time interval .Check the errorlogs in case some how freeproccache etc is being fired by someone ..With graph you can also see the time when it falls suddenly or a slow fall towards less PLE and more Page file usage...If possible you can also take the Disk IO related counters which can add to your conclusions ..

     

    Kind Regards


    Abhay Chaudhary OCP 9i, MCTS/MCITP (SQL Server 2005, 2008, 2005 BI) ms-abhay.blogspot.com/
    Sunday, January 29, 2012 6:33 AM