none
Replication Monitor Performance Critical, how to get true latency for transactional replication RRS feed

  • Question

  • I am working to upgrade our environment from very old servers running SQL 2000, initially to SQL Server 2008 R2 then up to a supported version. I have moved the distributor to a newer 2008 R2 sql server, the publisher is still SQL 2000, with SQL 2000 and SQL 2008 subscribers. Prior to this move, when I ran replicaiton monitor from the old SQL 2000 distributor, rarely did I see the "performance critical" status for the subscribers. Now most of the time replication monitor indicates 1:00 to 4:00 minutes of latency. When  I run sp_replcounters the latency is rarely over 3 seconds. I have an agent job set up to run frequently and notify me of any undistributed commands on the distributor but the job has yet to find any. I have seen a few posts indicating to ignore this warning from replication monitor...but that is hard to do when it is the main tool for working with your publications/subscriptions. Is this what is being done, just ignore the warning by most? I am looking for other monitoring suggestions outside of what I have noted here. I am thinking of running sp_replcounters ever minute or 2 and putting the results in a table, then checking that to see if it exceeds a threshold. Any comments or suggestions appreciated. I would also appreciate any links to any scripts people suggest for monitoring TR. Thank you
    Thursday, August 2, 2018 4:10 PM

Answers

  • The problem with the "undistributed commands" and the "estimated catchup time" (tabs in replication monitor) is, that is the latter is the estimated time SQL Server "thinks" it needs in order to bring the subscriber up to date with the publisher. That time duration is not the actual time SQL really needs and is based on the current performance etc. It's totally possible for a subscriber to catch up with the publisher almost immediately even when the catchup time shows in hours. 

    As for the second part, I have used the replication latency monitoring solution provided in this article and it works pretty fine. If you want something simple and straightforward, you can go with tracer tokens or canary tables to monitor latency.


    Please remember to click "Mark as Answer" if my response answered your question or click "Vote as helpful" if it helped you in any way.

    • Marked as answer by Abeljdang Friday, August 10, 2018 1:26 PM
    Friday, August 3, 2018 6:33 AM

All replies

  • This warning is frustrating. I normally bounce the agents and the warning goes away.

    I use tracer tokens and then send out warnings when they report significant latency. You might even find that by injecting tokens each minute this warning will go away.

    I normally use replication alerts to send me warnings when things get bad.

    Thursday, August 2, 2018 4:27 PM
    Moderator
  • The setting for warning is configurable.  You likely changed that in SQL 2000 and did not in the new server.

    The default is 30 seconds.  If you are not running replication continuously, then you need to set that appropriately to match your schedule.


    Thursday, August 2, 2018 5:15 PM
  • Thanks for the reply, since I have a 2000 publisher it appears I cant use tracer tokens, at least through the gui. I have set up some alerts, and the job I noted so I should be notified. I will try bouncing the agents tonight.
    • Edited by Abeljdang Thursday, August 2, 2018 7:30 PM text
    Thursday, August 2, 2018 7:25 PM
  • Thanks for the reply, that is possible this was set higher. We are running continuously so I think it should stay at 30 seconds. It must be just a bug as I have check sp_replcounters frequently and can't find any significant latency. Any other suggestion on finding latency or do you think I have the basics covered?
    Thursday, August 2, 2018 7:32 PM
  • you can also query distribution db:

    select delivery_latency/1000 as 'latency in sec'  from msdistribution_history where agent_id=<agent_id>

    Thursday, August 2, 2018 9:25 PM
  • The problem with the "undistributed commands" and the "estimated catchup time" (tabs in replication monitor) is, that is the latter is the estimated time SQL Server "thinks" it needs in order to bring the subscriber up to date with the publisher. That time duration is not the actual time SQL really needs and is based on the current performance etc. It's totally possible for a subscriber to catch up with the publisher almost immediately even when the catchup time shows in hours. 

    As for the second part, I have used the replication latency monitoring solution provided in this article and it works pretty fine. If you want something simple and straightforward, you can go with tracer tokens or canary tables to monitor latency.


    Please remember to click "Mark as Answer" if my response answered your question or click "Vote as helpful" if it helped you in any way.

    • Marked as answer by Abeljdang Friday, August 10, 2018 1:26 PM
    Friday, August 3, 2018 6:33 AM
  • What you are describing is almost always due to blocking on the subscriber.  Check what is happening on the subscriber and if something could be blocking during that time.

    Friday, August 3, 2018 11:09 AM
  • Tried the agent restart no luck, going with some monitoring I set up with notifications. Thanks for the reply
    Friday, August 10, 2018 1:21 PM
  • Thanks for the reply Tom, have been checking and there is no blocking. 
    Friday, August 10, 2018 1:26 PM