none
SQL Server 2000: The agent is suspect. No response within 10 minutes

    Question

  • Hi,

     

    I have two servers, one production server and one backup server which have transactional replication with a pull subscription.

     

    When I configure replication, it works fine during our test weekends testing production load. After tests, replication looks fine for a random number of days. Then, all of a sudden, an error message is displayed on one of the agents: "The agent is suspect, no response within 10 minutes."  This has happened a number o times. If I remove replication and configures it again, it always works. Sometimes it works by just updating one of the tables and the error message disappears. The last time (today) that did not work. Updating the database did not replicate and the error message remained.

     

    Has anyone experienced this same problem and has a god solution. One thing that is common is that the error message appears after long times of inactivity on the servers, or perhaps after a restart but that I am not sure about.

     

    Question 1: How can I prevent this error message?

     

    Question 2: Are there any special things to think about when I need to restart the servers and replication is configured, e.g. after installing updates from Windows Update.

     

    I would be very grateful for any answers regarding this.

     

    Best,
      /M

    Friday, April 20, 2007 6:51 PM

Answers

  • This error message merely means that the replication subsystem has not heard from the replication exec in some time. The replication subsystem will eventually timeout the execuatble.

     

    It is entirely possible that there is nothing wrong with the executable, perhaps it is applying a large snapshot. For example on some clients of mine which I have had to send large snapshots over wan or isdn links I have set the timeout to 1 day. Natually I have monitored replication using profiler to ensure that the distribution agent is not hung.

     

    Use sp_changedistributor_property 'heartbeat_interval',20

    to change this to 20 minutes for example.

     

    I most frequently see the message you are seeing in SQL 2000 if I am using concurrent snapshots.

    Monday, April 23, 2007 3:17 PM
    Moderator