none
Using REPLMERG for Web Synchronization of Anonymous Pull Subscription

    Question

  • Hello,

    I tried to use REPLMERG utility for Web synchronization and got the following error message in log file:

    2007-03-23 14:35:10.484 The subscription to publication 'X' could not be verified. Ensure that all Merge Agent command line parameters are specified correctly and that the subscription is correctly configured. If the Publisher no longer has information about this subscription, drop and recreate the subscription.

    The case description is as the following:

    REPLMERG utility command line:

    "C:\Program Files\Microsoft SQL Server\90\COM\REPLMERG.EXE" -ExchangeType 3 -Publication X -

    Publisher B01  -Subscriber  127.0.0.1\SQLEXPRESS  -Distributor B01 -PublisherDB DBS -

    SubscriberDB DBS -PublisherSecurityMode 1 -Output C:\Temp\DBS.log -OutputVerboseLevel 2  -

    SubscriberSecurityMode 0 -SubscriberLogin Operator -SubscriberPassword XXX -

    SubscriptionType 2 -DistributorSecurityMode 1 -Validate 3 -InternetURL

    https://www.company.com/Synch/replisapi.dll -InternetSecurityMode 0 -InternetLogin USER1 -

    InternetPassword PWD1 -SubscriberType 0

    The subscription to the publication X has been created with the following T-SQL script:

    USE DBS

    EXEC sp_addmergepullsubscription

    @publisher = N'B01',

    @publication = N'X',

    @publisher_db = N'DBS',

    @subscriber_type = N'anonymous',

    @sync_type = N'automatic',

    @subscription_priority = 0.0;


    I've also tried to check subscription information on the server 127.0.0.1\SQLEXPRESS using the following:

    sp_helpsubscription_properties N'B01', N'DBS', N'X'

    but the result set is empty. When I run sp_helpsubscription_properties

    I get the same empty result. But the subcription is anonymous and when I synchronize it using RMO - it works.

    Could you please advice what could cause the problem?

    Thank you.

     

    Alexander.

    Friday, March 23, 2007 2:53 PM

Answers

  • The problem in our case, which I suspect is the same for your case, is DNS resolution.  You can stop here if all you want is the basic answer.

     

    If you want the gory details, grab your favorite beverage, and pull up a chair.

     

    The merge engine is based on a pile of metadata that exists at both publisher and subscriber.  On the subscriber, the two most important pieces during the initital handshake (connection) are contained within sysmergesubscriptions and sysmergearticles.  sysmergesubscriptions contains an entry for the subscriber and might possibly contain entries for other subscribers in the architecture, but the critical one is for the subscriber itself.  One of the parameters for the merge agent is the name of the subscriber.

     

    When the agent connects, it uses this value to interrogate sysmergesubscriptions and requires a valid value to come back.  There are many reason why it does this, but the simplest example is security.  The agent needs to ensure that it is in fact connected to the subscriber it expects to be connected to before uploading or downloading changes so that you are not sending improper data down to a subscriber that it should not have.

     

    If it succeeds in the check against sysmergesubscriptions, it then utilizes sysmergearticles to ensure that the subscriber entires match what is supposed to be there as well for each table.

     

    Once it validates this information, it is basically ready to start synchronizing.  Each synch cycle is assigned a synchronization GUID and this value is checked periodically to ensure that nothing has become corrupted during the synch.

     

    So, you can start to get a bit of an idea of everything that needs to go right for a synch to occur and all of this is necessary to ensure data does not get corrupted or security violated.

     

    Each cycle of the merge agent, a new connection is made to both the publisher and subscriber.  Since all instances are referenced by name, the replication engine relies on DNS being properly configured in order to route requests to the correct machine.  This issue only occurs with a push, because the connection is generally initiated from the publisher whose IP address is almost always going to be static meaning it will always resolve correctly.  (We'll use that for a close enough assumption so that I don't have to type up another 10 pages to handle the general case.)  However, the subscriber can be a moving target and it needs DNS to always be correct.  When you do a pull, the connection is initiated from the subscriber who also specifies its name and since that will always match the metadata, there aren't any issues.

     

    What does all of this background have to do with this issue?

     

    You have 1 publisher and 2 subscribers.  On the publisher, there will be 2 merge agents up and running, we'll call them Sub1 and Sub2.  The merge agent for Sub1 has a parameter that says the subscriber that I am connecting to is named Sub1 and the same goes for Sub2.  We'll take the case of continously running so that I can cut down on the amount of typing. Sub1 comes online, gets an IP address (let's say it was .1 for short), and a short period later gets registered into DNS so that the merge agent can resolve it.  The next cycle of the Sub1 agent, gets a response to its connection, so it now knows that Sub1 is online.  It interrogates sysmergesubscriptions on Sub1 to validate that it is connected to the correct place and also checks sysmergearticles and the last synch GUID that should be there.  When all of this stuff checks out, the upload/download/conflict resolution occurs.  The merge agent then goes to sleep waiting 60 seconds to poll for additional changes.  After 60 seconds is up, it initiates the connection to Sub1 again, gets a response, connects to the publisher and gets a response, checks the metadata, and starts moving changes.  It then goes to sleep waiting for another 60 seconds.  About 40 seconds later, Sub1 disconnects from the network.  Sub2 connects to the network and is assigned an IP address of .1 (note this is EXACTLY the same IP address as was assigned to Sub1 just seconds before)  The merge agent for Sub1 wakes up and initiates a connection request.  Well, it sends a request out to the network to get the IP address for Sub1 and DNS responds back that Sub1 is at .1 (it actually isn't, but because DNS hasn't updated yet, we now have a misroute).  The connection for Sub1 merge agent gets created to .1, but there is a very obvious problem at this point.  The agent then fires off the validation and gets nothing back when it goes to look for the row for Sub1 in sysmergesubscriptions.  Since it fails this check, it immediately fires of the cleanup procs which remove the entries from sysmergesubscriptions and sysmergearticles in preparation for an initialization.  It then exits, drops the connection, and reports an error back for Sub1.  DNS then gets properly updated to now say that Sub2 is at .1.  The merge agent for Sub2 all of a sudden now has a connection to its subscriber.  It connects to Sub2 (which is at .1) and goes to validate the metadata.  It finds that the metadata is non-existent and flags this subscriber as corrupt.

     

    All because DNS did not properly update and resolve machines.  So, this is not a bug, nor is it an issue with replication.  This is a network configuration error.

    Thursday, April 26, 2007 5:20 PM

All replies

  • After adding the subscription at the subscriber db, can you run:

    sp_helpmergepullsubscription

    instead of sp_helpsubscription_properties

     

    Does the subscription show there?

    You say that when you run the sync using RMO it works. What is the difference in the set of parameters that you set in your RMO sync agent and the command line that you run (yor posted above). Print out the parameters and compare and see if there are any differences.

    Wednesday, March 28, 2007 4:27 AM
  • I’ve tried sp_helpmergepullsubscription on subscription database. It shows the subscription.

    Here is comparison table for replication parameters:

     

    Parameter

    RMO

    REPLMERG

    -ExchangeType

    Not used

    3

    -Publication

    X

    X

    -Subscriber

    127.0.0.1\SQLEXPRESS

    127.0.0.1\SQLEXPRESS

    -Distributor

    B01

    B01

    -PublisherDB

    DBS

    DBS

    -SubscriberDB

    DBS

    DBS

    -PublisherSecurityMode

    SecurityMode.Integrated

    1

    -Output

    ””

    C:\Temp\DBS.log

    -OutputVerboseLevel

    1

    2

    -SubscriberSecurityMode

    ServerConnection conn = new ServerConnection(subscriberName, userName, userPassword);

    0

    -SubscriberLogin

    Operator

    Operator

    -SubscriberPassword

    XXX

    XXX

    -SubscriptionType

    MergePullSubscription subscription;

    2

    -DistributorSecurityMode

    SecurityMode.Integrated

    1

    -Validate

    Not used

    3

    -InternetURL

    https://www.company.com/Synch/replisapi.dll

    https://www.company.com/Synch/replisapi.dll

    -InternetSecurityMode

    SecurityMode.Standard

    0

    -InternetLogin

    USER1

    USER1

    -InternetPassword

    PWD1

    PWD1

    -SubscriberType

    @subscriber_type = N'anonymous'

    0

     

    I'm still at the same status.

     

    Thank you.

     

    Alexander.

    Thursday, March 29, 2007 9:38 AM
  • I do not see any differences between the RMO and command line.

    Check if you are connecting to the right publisher/distributor.

    It should succeed.

    Also try dropping and recreating the subscription.

    Sunday, April 01, 2007 12:40 AM
  • Mahesh,

     

    Publisher and distributor is the same SQL Server 2005 instance.

    I tried recreating the subscription several times also. It works with RMO but it does not with the command line utility. The subscriber is SQL Server 2005 Express. Can it be the case?

     

    Thank you.

     

    Alexander. 

    Monday, April 02, 2007 1:05 PM
  • Unfortunately, this does not appear to be an isolated incident and it is not specific to web synchronization.  It is universal to merge replication when you are initiating synchronization using the merge agent. I have this occuring at 3 customers and it has affected more than 180 subscribers thus far.  Still have not found a cause.  I have a support case open that is currently being worked.  We do know that it only occurs with subscribers who have disconnected for a while, usually in excess of 1 day, but are not anywhere near hitting the retention interval that would knock them out.  The entry in sysmergesubscriptions for the particular subscriber gets deleted which is why the exact error message you have appears. Along with the entry in sysmergesubscriptions being deleted, the entire contents of sysmergearticles is also deleted.  This occurs ONLY on the subscriber.  All entries in the metadata tables on the publisher are correct and not missing anything.  We have auditing triggers behind sysmergearticles and sysmergesubscriptions that will hopefully log the culprit of these deletes so that it can get escalated to get a fix in place.

     

    All susbcribers in each case are SQL Server Express Edition.

     

    Can you reliably reproduce this behavior?  I haven't been able to get the same subscriber to exhibit this behavior repeatably and have not been able to reproduce this other than waiting for the next "random" one to corrupt.

    Wednesday, April 11, 2007 7:57 AM
  • Michael,

     

    Thank you. I checked tables sysmergesubscriptions and sysmergearticles and found out that sysmergesubscriptions does not contain records about the subcription I've added for web synchronization. I guess this can be due to the fact that the subscription for web synchronization is Anonymous. Table  sysmergearticles does not contain records related to the subscription either.

     

    I've checked on subscribers with SQL Server 2005 Express Edition SP2 and SQL Server 2005 Developer Edition SP2.

     

    I tried to create and synchronize (using REPLMERG) merge subscriptions for web synchronization several times with the same result. All the parameters for RMO application and for REPLMERG utility were identical and correct as Mahesh also have checked.

    I managed to find a temporary solution only for customers that have SQL Server 2005 Standard Edition. But there are only few of them. The solution was to use SQL Agent as a scheduled merge agent. As you know, Express does not have this feature.

     

    I still need solution for the problem as most of our customers have SQL Server 2005 Express Edition on-board and I would appreciate if you let me know about solution for your case.

     

    Alexander.

    Wednesday, April 11, 2007 8:33 AM
  • I'll definitely post an update.  The problem we have right now is that while the problem does recreate itself, I can not create a specific set of steps that will reproduce the problem every single time.  There is something triggering it, but it still appears to be a random behavior as I have subscribers that were all created at the same time, with exactly the same settings, and only some of them will exhibit this behavior.  We do have it setup for continuous and I have a suspicion that it is related to a certain number of failed attempts due to the fact that the subscriber is not connected, but don't have any data yet to support that theory.
    Thursday, April 12, 2007 4:59 AM
  • Mike,

     

    What is clear for me now is that:

    1. There is nothing to do with a publisher/distributor. The problem lies on subscriber
    2. It does not depend on SQL Server 2005 edition
    3. Service pack 2 didn't help
    4. Isolated example of the problem is anonymous merge pull subscription when using REPLMERG command line utility. But it does not arise when synchronization is being performed by SQLAgent in one step with step type 'Replication Merge' and command '-Publisher [SRV\INST0] -PublisherDB [DB] -Publication [DBW] -Subscriber [MYCOMP\INST1] -SubscriberDB [DB] -SubscriptionType 2 -SubscriberSecurityMode 1 -Distributor [SRV\INST0]'

    Alex.

     

    Thursday, April 12, 2007 8:09 AM
  • All of mine are occuring when using either replmerg fired from a command line or firing it off through SQL Server Agent.
    Thursday, April 12, 2007 8:12 PM
  • Mike,

     

    It looks like you use merge pull subscriptions. Because when you use anonymous subscription then the records in sysmergesubscriptions and sysmergearticles are not being created initially. After [unknown yet] has happened, your subscriptions get to the case equal to the initial state of anonymous subscription.

    I guestimate, but it looks obvious that the reason why I can run synchronization using SQLAgent job is parameter '-SubscriptionType 2' (Anonymous).

    In your case it should be 1 for pull subscriptions.

     

    Alex.

    Thursday, April 12, 2007 9:08 PM
  • No, we're using continuous, push subscriptions.
    Thursday, April 12, 2007 10:13 PM
  • Yeah... I was guesstimating. Not a big difference for the case though Wink

     

    Does Microsoft is still fighting with it?

    Thursday, April 19, 2007 3:54 PM
  • We're still working on the issue.  No updates yet.
    Thursday, April 19, 2007 6:29 PM
  • Getting a lot closer.  Have the issue reproduced and what seems to be an explanation of the problem.  I'll post a bit more later after doing some more testing.  This is a "nice" little issue.
    Wednesday, April 25, 2007 8:50 AM
  • The problem in our case, which I suspect is the same for your case, is DNS resolution.  You can stop here if all you want is the basic answer.

     

    If you want the gory details, grab your favorite beverage, and pull up a chair.

     

    The merge engine is based on a pile of metadata that exists at both publisher and subscriber.  On the subscriber, the two most important pieces during the initital handshake (connection) are contained within sysmergesubscriptions and sysmergearticles.  sysmergesubscriptions contains an entry for the subscriber and might possibly contain entries for other subscribers in the architecture, but the critical one is for the subscriber itself.  One of the parameters for the merge agent is the name of the subscriber.

     

    When the agent connects, it uses this value to interrogate sysmergesubscriptions and requires a valid value to come back.  There are many reason why it does this, but the simplest example is security.  The agent needs to ensure that it is in fact connected to the subscriber it expects to be connected to before uploading or downloading changes so that you are not sending improper data down to a subscriber that it should not have.

     

    If it succeeds in the check against sysmergesubscriptions, it then utilizes sysmergearticles to ensure that the subscriber entires match what is supposed to be there as well for each table.

     

    Once it validates this information, it is basically ready to start synchronizing.  Each synch cycle is assigned a synchronization GUID and this value is checked periodically to ensure that nothing has become corrupted during the synch.

     

    So, you can start to get a bit of an idea of everything that needs to go right for a synch to occur and all of this is necessary to ensure data does not get corrupted or security violated.

     

    Each cycle of the merge agent, a new connection is made to both the publisher and subscriber.  Since all instances are referenced by name, the replication engine relies on DNS being properly configured in order to route requests to the correct machine.  This issue only occurs with a push, because the connection is generally initiated from the publisher whose IP address is almost always going to be static meaning it will always resolve correctly.  (We'll use that for a close enough assumption so that I don't have to type up another 10 pages to handle the general case.)  However, the subscriber can be a moving target and it needs DNS to always be correct.  When you do a pull, the connection is initiated from the subscriber who also specifies its name and since that will always match the metadata, there aren't any issues.

     

    What does all of this background have to do with this issue?

     

    You have 1 publisher and 2 subscribers.  On the publisher, there will be 2 merge agents up and running, we'll call them Sub1 and Sub2.  The merge agent for Sub1 has a parameter that says the subscriber that I am connecting to is named Sub1 and the same goes for Sub2.  We'll take the case of continously running so that I can cut down on the amount of typing. Sub1 comes online, gets an IP address (let's say it was .1 for short), and a short period later gets registered into DNS so that the merge agent can resolve it.  The next cycle of the Sub1 agent, gets a response to its connection, so it now knows that Sub1 is online.  It interrogates sysmergesubscriptions on Sub1 to validate that it is connected to the correct place and also checks sysmergearticles and the last synch GUID that should be there.  When all of this stuff checks out, the upload/download/conflict resolution occurs.  The merge agent then goes to sleep waiting 60 seconds to poll for additional changes.  After 60 seconds is up, it initiates the connection to Sub1 again, gets a response, connects to the publisher and gets a response, checks the metadata, and starts moving changes.  It then goes to sleep waiting for another 60 seconds.  About 40 seconds later, Sub1 disconnects from the network.  Sub2 connects to the network and is assigned an IP address of .1 (note this is EXACTLY the same IP address as was assigned to Sub1 just seconds before)  The merge agent for Sub1 wakes up and initiates a connection request.  Well, it sends a request out to the network to get the IP address for Sub1 and DNS responds back that Sub1 is at .1 (it actually isn't, but because DNS hasn't updated yet, we now have a misroute).  The connection for Sub1 merge agent gets created to .1, but there is a very obvious problem at this point.  The agent then fires off the validation and gets nothing back when it goes to look for the row for Sub1 in sysmergesubscriptions.  Since it fails this check, it immediately fires of the cleanup procs which remove the entries from sysmergesubscriptions and sysmergearticles in preparation for an initialization.  It then exits, drops the connection, and reports an error back for Sub1.  DNS then gets properly updated to now say that Sub2 is at .1.  The merge agent for Sub2 all of a sudden now has a connection to its subscriber.  It connects to Sub2 (which is at .1) and goes to validate the metadata.  It finds that the metadata is non-existent and flags this subscriber as corrupt.

     

    All because DNS did not properly update and resolve machines.  So, this is not a bug, nor is it an issue with replication.  This is a network configuration error.

    Thursday, April 26, 2007 5:20 PM