none
Low performance Always On 2014 RRS feed

  • Question

  • Dear all,

    I'm concerned for the most important Always on running in live. General low performance is increasingly as times goes by and I am stuck, I don’t know what else might I check.

    In parallel to this article I've published another one in Dynamics community too.

    Can you kindly address me on how to grasp more key info? Root cause could be indexes (thousands of them as you will know Microsoft ERP consume them many) but it would not explain this extremely slowness to my end users

    Some hints have received recommend me scale up asap but... it is not now possible because in just 12 months or so all this stuff will be move to another ERP (AX) running in a big beast (64 cores, 256 RAM). In the meantime we need to add some fuel to this one.

    Facts:

    Flavour application: Navision 2009

    SQL figures: 32 Gb Ram, 8 cores. Xeon CPU E5-2690 v3

    OS: Win 2012 R2 DataCenter

    Manufacturer: Vmware

    Storage: Mixed SAS with SSD

    Number of Navision implementations:  13

    Total Navision “User tables”: 27000

     Keep in mind:

    -SP1 is still not applied :-(..   although I don’t think it could be very helpful in this case.

    -Rebuild/Reorganize on weekly basis (Ola Hallengren’s stuff)

    -TempDb is not splitted into more than 1 file as opposed we have got with AX in other location

    Findings

    No PAGELATCH_EX has been found, no deadlocks.

    DBCC LOGINFO inform usual number of VLF

    Thanks for your comments, hints or advices,

    Monday, July 4, 2016 3:59 PM

Answers

  • I think the issue is obvious (RAM) but let's ask some questions.

    Thirteen "implementations", is that thirteen databases, or thirteen instances, or thirteen VMs?  Or do they all share the same database with different keys?  Whichever, it's going to divide up your effective RAM pretty heavily.

    How large in megabytes/gigabytes is the average "implementation"?  27,000 tables / 13 = 2,000+ tables/instance.

    What does your page life expectancy look like?  Probably on the floor.

    And of course fixing your tempdb would probably help a little.

    Finally, make sure your server max memory is set so your OS isn't paging, any paging kills SQL Server performance.

    Josh



    ps - and, as always, what's your maxdop setting, and does Navision dictate that?

    Monday, July 4, 2016 4:51 PM
  • Please run the sql server diagnositc queries and send us the results. You can use glen berry's queries

    http://www.sqlskills.com/blogs/glenn/sql-server-diagnostic-information-queries-for-june-2016/ 


    Monday, July 4, 2016 4:58 PM

All replies

  • Hello Enric,

    may you be more clear about the slowness? It's vague...

    1) you have a virtual environment, how do you have configured your VM? Have you followed VMware best practices for production databases?

    2) post the output of select * from sys.dm_os_wait_tasks order by wait_time desc

    3) is the slowness random? Are you able to programatically reproduce it?

    4) have you looked at indexes?

    5) how big the database is?

    6) post the output of exec sp_configure (please enable advanced options)

    Monday, July 4, 2016 4:33 PM
  • I think the issue is obvious (RAM) but let's ask some questions.

    Thirteen "implementations", is that thirteen databases, or thirteen instances, or thirteen VMs?  Or do they all share the same database with different keys?  Whichever, it's going to divide up your effective RAM pretty heavily.

    How large in megabytes/gigabytes is the average "implementation"?  27,000 tables / 13 = 2,000+ tables/instance.

    What does your page life expectancy look like?  Probably on the floor.

    And of course fixing your tempdb would probably help a little.

    Finally, make sure your server max memory is set so your OS isn't paging, any paging kills SQL Server performance.

    Josh



    ps - and, as always, what's your maxdop setting, and does Navision dictate that?

    Monday, July 4, 2016 4:51 PM
  • Please run the sql server diagnositc queries and send us the results. You can use glen berry's queries

    http://www.sqlskills.com/blogs/glenn/sql-server-diagnostic-information-queries-for-june-2016/ 


    Monday, July 4, 2016 4:58 PM
  • Please run the sql server diagnositc queries and send us the results. You can use glen beery's queries

    http://www.sqlskills.com/blogs/glenn/sql-server-diagnostic-information-queries-for-june-2016/ 

    OMG, LOL, Glen has been busy!  Just reviewing those scripts would make a great db architect interview!

    I keep thinking I should try to write a comprehensive script like that, but I think it goes better in a better language than TSQL.  Powershell, maybe, if not just a C# app.

    The problem is that any single snapshot tends to have aggregate numbers over the past weeks or months and you need something more dynamic to get a good view of current problems.  But no doubt Glen's scripts would answer a lot of standard questions.

    Josh

    Monday, July 4, 2016 5:33 PM
  • Josh

    I got the message that this site can't be reached,...


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Tuesday, July 5, 2016 5:04 AM
    Answerer
  • I got the message that this site can't be reached,...

    It's working for me.  SQLSkills and Glen Berry, pretty well established names.  Maybe had a brief glitch.

    Josh

    Tuesday, July 5, 2016 5:14 AM
  • Hi Andrea,

    thanks for your reply.

    The whole application goes like a snail.  I did not participate for configuring it. AFAIK everything was done following best practices.

    Not randomly happen

    Indexes are treated on weekly basis (usually take till 2h 30 min for the bigger databases)

    This is the result for

     select * from sys.dm_os_waiting_tasks order by wait_duration_ms desc


    Size databases, not huge databases at all, just the top ten:

    sp_configure screenshots:


    Tuesday, July 5, 2016 7:41 AM
  • hi Ramasankar, useful bunch of queries. Testing now
    Tuesday, July 5, 2016 7:44 AM
  • Hi Enric, please post the results of this:

    select * from sys.dm_os_wait_stats order by wait_time_ms desc

    and this

    select db.name,vs.* from sys.dm_io_virtual_file_stats(-1,-1) vs
    join sys.databases db on db.database_id=vs.database_id 
    order by vs.io_stall_write_ms desc

    select db.name,vs.* from sys.dm_io_virtual_file_stats(-1,-1) vs
    join sys.databases db on db.database_id=vs.database_id 
    order by vs.io_stall_read_ms desc

    Tuesday, July 5, 2016 7:49 AM
  • Hello Enric,

    do you have historical data on cpu usage and page life expectancy?

    Tuesday, July 5, 2016 7:56 AM
  • Hi again,

    I've used this both queries in the most busied databases and 80% of the indexes are out of date (2016-05-28) it really grab my attention:

    -- When were Statistics last updated on all indexes?  (Query 61) (Statistics Update)
    SELECT SCHEMA_NAME(o.Schema_ID) + N'.' + o.NAME AS [Object Name], o.type_desc AS [Object Type],
          i.name AS [Index Name], STATS_DATE(i.[object_id], i.index_id) AS [Statistics Date], 
          s.auto_created, s.no_recompute, s.user_created, s.is_incremental, s.is_temporary,
    	  st.row_count, st.used_page_count
    FROM sys.objects AS o WITH (NOLOCK)
    INNER JOIN sys.indexes AS i WITH (NOLOCK)
    ON o.[object_id] = i.[object_id]
    INNER JOIN sys.stats AS s WITH (NOLOCK)
    ON i.[object_id] = s.[object_id] 
    AND i.index_id = s.stats_id
    INNER JOIN sys.dm_db_partition_stats AS st WITH (NOLOCK)
    ON o.[object_id] = st.[object_id]
    AND i.[index_id] = st.[index_id]
    WHERE o.[type] IN ('U', 'V')
    AND st.row_count > 0
    ORDER BY STATS_DATE(i.[object_id], i.index_id) DESC OPTION (RECOMPILE);
    ------  
    
    -- Helps discover possible problems with out-of-date statistics
    -- Also gives you an idea which indexes are the most active
    
    
    -- Look at most frequently modified indexes and statistics (Query 62) (Volatile Indexes)
    SELECT o.name AS [Object Name], o.[object_id], o.type_desc, s.name AS [Statistics Name], 
           s.stats_id, s.no_recompute, s.auto_created, 
    	   sp.modification_counter, sp.rows, sp.rows_sampled, sp.last_updated
    FROM sys.objects AS o WITH (NOLOCK)
    INNER JOIN sys.stats AS s WITH (NOLOCK)
    ON s.object_id = o.object_id
    CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) AS sp
    WHERE o.type_desc NOT IN (N'SYSTEM_TABLE', N'INTERNAL_TABLE')
    AND sp.modification_counter > 0
    ORDER BY sp.modification_counter DESC, o.name OPTION (RECOMPILE);

    Tuesday, July 5, 2016 8:07 AM
  • Hi Andrea,

    Empty results (interval 20 minutes for two tries) for the last two former ones.

    Sys.dm_os_wait_stats returns 771 rows.

    Tuesday, July 5, 2016 12:29 PM
  • Andrea, no I haven't.

    Right now:

    
     SELECT  @@servername AS INSTANCE
    ,[object_name]
    ,[counter_name]
    , UPTIME_MIN = CASE WHEN[counter_name]= 'Page life expectancy'
              THEN (SELECT DATEDIFF(MI, MAX(login_time),GETDATE())
              FROM   master.sys.sysprocesses
              WHERE  cmd='LAZY WRITER')
          ELSE ''
    END
    , [cntr_value] AS PLE_SECS
    ,[cntr_value]/ 60 AS PLE_MINS
    ,[cntr_value]/ 3600 AS PLE_HOURS
    ,[cntr_value]/ 86400 AS PLE_DAYS
    FROM  sys.dm_os_performance_counters
    WHERE   [object_name] LIKE '%Manager%'
              AND[counter_name] = 'Page life expectancy'
    
    

    Tuesday, July 5, 2016 12:38 PM
  • Hi,

    Please, tell us a bit about your AlwaysOn topology.

    How many nodes ? All in the same LAN or not ?

    Asynchronous or synchronous ?

    What's the latency between them ?

    Thanks

    Tuesday, July 5, 2016 1:07 PM
  • Hi Sebastian, thanks for your reply.

    A couple of replicas, same subnet.

    Availability groups just one just for all the Navision databases. The server only has Navision and LS databases

    Synchronous commit along with Automatic as Failover Mode.

    Let me know how to find out the latencey between them.

    Tuesday, July 5, 2016 1:29 PM
  • Hi Enric,

    maybe the slowness comes from slow sync between the replicas. Do a simple test: change commit from synchronous to asynchronous and see if you have some benefit.

    Tuesday, July 5, 2016 1:38 PM
  • Hi JRStern, thanks for your post.

    13 implementation means 13 countries (more or less) allocated in different databases.

    SQL Server Max Memory? we leaved 3 Gb for OS

    There three things that we need to do asap.

    • Old statistics from May must be mended (this night)
    • Service Pack 1 is not applied yet (weekend, WSUS is lazy)
    • Tempdb is not splitted into several files (off peak hours)

    MaxDop is 1, best practice

    Tuesday, July 5, 2016 1:41 PM
  • Andrea,

    Like JRStern told before it's very obvious that main problem is RAM.

    I've got in my table three Page Life Expectancy values for three big Always ON in our organization:

    The current one here: 1390

    Big beast (256Gb): 1728197

    Another cluster with exactly the same CPU and RAM: 315625


    Tuesday, July 5, 2016 2:16 PM
  • The current one here: 1390

    Actually 1390 is not that bad for a 32gb system, not great, but not horrible.

    On those wait states, the ones showing here are all system states that should be filtered out, please try to find a script that shows how to do this.

    Thanks,

    Josh

    Tuesday, July 5, 2016 3:11 PM
  • hi Josh, are you talking about Sys.dm_os_wait_stats?

    the most important are all the related to Latch/Locks

    Tuesday, July 5, 2016 3:54 PM
  • Hi Enric,

    As Andrea Caldarone explained below, you can easily identify synchronization slowness by switching your 

    availability groups to asynchronous mode.

    Sometimes, the performance difference between synchronous and asynchronous is dramatic, though it comes at the cost of having your asynchronous node a few seconds/minutes behind its primary.

    Tuesday, July 5, 2016 4:00 PM
  • Sebastian,  in live?

    180 sessions running in this moment (GMT-1)  Because of we run only this ERP (Navision) this kind of Apps has low time-out.

    I want to go home by myself not in ambulance

    Tuesday, July 5, 2016 4:07 PM
  • >Empty results (interval 20 minutes for two tries) for the last two former ones.

    >Sys.dm_os_wait_stats returns 771 rows.

    The relevant ones got cut off.  There are lots waits that only apply to background processes, and they can be disregarded. 

    EG for a query filtering many of the uninteresting waits.

    My Favorite Query For Investigating SQL Server Performance

    David


    David http://blogs.msdn.com/b/dbrowne/

    Tuesday, July 5, 2016 4:28 PM
  • Good one, thanks David, even it's grasping one deliberated deadlock followed by this one:

     select 
    
     ses.login_time,
     ses.host_name,
     ses.login_name,
     owt.wait_duration_ms, 
     CONVERT(TIME,DATEADD (ms, cast(owt.wait_duration_ms as int), 0)) as sec,
     owt.waiting_task_address,
     owt.wait_type,
     tls.resource_associated_entity_id,
     tls.resource_description AS local_resource_description,
     tls.resource_type,
     tls.request_mode,
     tls.request_type,
     tls.request_session_id,
     owt.resource_description AS blocking_resource_description 
     --into tbl_block_info2
    FROM sys.dm_tran_locks AS tls
     INNER JOIN sys.dm_os_waiting_tasks owt ON tls.lock_owner_address = owt.resource_address
    left join sys.dm_exec_sessions ses ON owt.session_id = ses.session_id

    Tuesday, July 5, 2016 4:33 PM
  • I think it is not relevant at all just scheduled a couple of Agent Jobs every 10s trying to discover deadlocks or locks..

    At the end of the day at least they will bring me some data for reporting Business guys...

    INSERT INTO tbl_block_info
    (
    [dbname],
    [blocking_session_id],
    [blocking_name],
    [blocking_hostname],
    [blocked_session_id],
    [blocked_user],
    [blocked_hostname],
    [waittime],
    [lastwaittype],
    [checkdate],
    [blocking_query]
    )
    SELECT  
    sd.name AS dbname, 
    blocker.spid AS blocking_session_id, --ok
    blocker.loginame AS blocking_name, 
    blocker.hostname AS blocking_hostname, 
    sp.spid AS blocked_session_id, 
    sp.loginame AS blocked_user, 
    sp.hostname AS blocked_hostname, 
    sp.waittime, 
    sp.lastwaittype, 
    getdate() AS checkdate, 
    t.TEXT AS blocking_query
    FROM 
    sys.sysprocesses sp INNER JOIN sys.sysprocesses blocker ON sp.spid = blocker.blocked 
    INNER JOIN sys.databases sd ON sp.dbid = sd.database_id 
    CROSS APPLY sys.dm_exec_sql_text( blocker.sql_handle ) t 

     INSERT INTO tbl_block_info2
    
    (
    login_time,
    [host_name],
    login_name,
    wait_duration_ms,
    sec,
    waiting_task_address,
    wait_type,
    resource_associated_entity_id,
    local_resource_description,
    resource_type,
    request_mode,
    request_type,
    request_session_id,
    blocking_resource_description,
    sql_statement
    )
    
    --sp_help 'tbl_block_info2'
    
     select 
    
     ses.login_time,
     ses.host_name,
     ses.login_name,
     owt.wait_duration_ms,
    null,
     --CONVERT(TIME,DATEADD (ms, cast(owt.wait_duration_ms as int), 0)) as sec,
     owt.waiting_task_address,
     owt.wait_type,
     tls.resource_associated_entity_id,
     tls.resource_description AS local_resource_description,
     tls.resource_type,
     tls.request_mode,
     tls.request_type,
     tls.request_session_id,
     owt.resource_description AS blocking_resource_description ,
     T.TEXT
     --into tbl_block_info2
    FROM sys.dm_tran_locks AS tls
     INNER JOIN sys.dm_os_waiting_tasks owt ON tls.lock_owner_address = owt.resource_address
    left join sys.dm_exec_sessions ses ON owt.session_id = ses.session_id
    LEFT JOIN SYS.DM_EXEC_REQUESTS REQ ON owt.session_id = req.session_id
    CROSS APPLY sys.dm_exec_sql_text( req.sql_handle ) t


    • Edited by Enric Vives Wednesday, July 6, 2016 11:29 AM
    Wednesday, July 6, 2016 11:28 AM
  • >I think it is not relevant at all just scheduled a couple of Agent Jobs every 10s trying to discover deadlocks or locks..

    The full Wait Stats will tell you whether lock waits are a significant source of slowness to the application.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Wednesday, July 6, 2016 1:35 PM
  • Hi,

    Not need to run this test live.

    You can test synchronous vs asynchronous performance between 2 Always On servers on separate machines, ideally on the same LANs and WANs as your production machines.

    Wednesday, July 6, 2016 10:41 PM
  • Which are the usual values for this query in live?

    Select tex.text, *
    from sys.dm_exec_cursors(0)
    cross apply fn_get_sql(sql_handle) tex
    where datediff(ss, creation_time,current_timestamp)  < 600
    ORDER BY CREATION_TIME DESC

    I get 100 rows for the last 10 minutes.

    Thursday, July 7, 2016 3:36 PM
  • Which are the usual values for this query in live?

    Select tex.text, *
    from sys.dm_exec_cursors(0)
    cross apply fn_get_sql(sql_handle) tex
    where datediff(ss, creation_time,current_timestamp)  < 600
    ORDER BY CREATION_TIME DESC

    I get 100 rows for the last 10 minutes.

    Probably better if you start a new thread for this.

    100 sounds like a low number, I'm guessing I'd get a much higher number.  No, wait, OPEN cursors?  I get zero.

    We have a lot of code that uses cursors, but (a) they don't generally stay open long, and (b) they are often opened on temp tables which may not show if this is run on your main db.

    Josh

    Thursday, July 7, 2016 6:09 PM
  • 178 rows for that query. All of them have '1' in is_open column.

    Friday, July 8, 2016 8:16 AM