none
Trans Repl terminates and restarts after error RRS feed

  • Question

  • Afternoon,

    Having a strange issue where transx replication is running for one of our larger publication and is going along smoothly then an error hit ...

         TCP Provider: An existing connection was forcibly closed by the remote host.

    The sync then initializes again and starts delivery the rest of the transactions neededing to be pushed out.

    Added to this is that our alerting tool (DELL Spotlight) fires off a call to the oncall DBA saying the instance is down BUT everything is fine.

    I do not see anything in the WIN App or Sys event logs at that time.

    Wondering what I need to look at to figure this out or get more details.

    Thanks ALL!

    Wednesday, December 11, 2013 7:00 PM

Answers

All replies

  • This almost always means that the replication agents can't find the SQL Server on the remote host. Your Firewall could be preventing the connection, or your browser service is not running.

    looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941

    Wednesday, December 11, 2013 9:56 PM
    Moderator
  • Thanks for the info Hilary,

    First answer to the 2nd question is YES ... BS is running

    As for the Firewall reply ... I wanted to make sure we were on the same page and that we are replicating fine for all the other publications (18 of them) it is just two specific publications that are on rather large articles (each pub is 1 article of a table consisting of 3-4 billion rows).

    The two publications DO connect and start transferring the xactions from the distributor to the subscriber but then x amount of time into the xfer the error hits.

    It then reinitializes the push and the remain xactions are migrated over.

    Stanger still is that is does not happen on every pull (since the subscriptions are set to replication at a couple designated hours during the day).

    What I am also seeing (and am chasing down) is in the SQL event log I can see a couple of traces stop and then start as if the SQL service hiccupped to a point that it look as if it were down BUT it the SQL service does not do a true restart.

    This event also triggers DELL Spotlight on SQL Server to generate an alert/call saying the instance is down even though it is not.

    The errors I see (not at the same time but at different events) are the following:

      Monitored Server - SQL Server Connection Failure: A transport-level error has occurred when receiving results from the server. (provider: TCP Provider, error: 0 - The specified network name is no longer available.) [64]

      Monitored Server - SQL Server Connection Failure: A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) [10054]

    So I am about to open a case with MS and see what I can dig up.

    Of course any insight would be greatly appreciated!

    Thursday, December 12, 2013 2:07 PM
  • If these two problem publications are not working and the other ones are working successfully to the same subscriber you can rule out firewall issue.

    Spotlight does restart traces every 5 minutes IIRC. If SQL Server itself restarts there will be error messages in the SQL Server log.

    One more thing to check - some spam appliances will monitor network traffic and reject some network commands that contain what it terms to be dangerous content - ie things like xp_cmdshell calls.

    Can you run sp_browsereplcmds on your distribution database, or run profiler to determine what is attempted to be called on the subscriber?


    looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941

    Thursday, December 12, 2013 2:25 PM
    Moderator
  • A little more meat and potatoes here ...

    The client just manually started all of this SQL repl jobs.

    The usual 17 smaller subscriptions ran without a hitch.

    BOTH of the larger 2 subscriptions were still running without issue for a bit when I checked in.

    One of the the two fiinished sucessfully after just under 5 minutes.

    The other subscription looks like it failed right from the start with the error message found in the step showing (see below)

    === ERROR

    MESSAGE
    Date 12/12/2013 9:23:25 AM
    Log Job History (REPL DIST - Group12)
    Step ID 1
    Server MySubscriber
    Job Name REPL DIST - Group12
    Step Name Run agent.
    Duration 00:03:41
    Sql Severity 0
    Sql Message ID 0
    Operator Emailed 
    Operator Net sent 
    Operator Paged 
    Retries Attempted 0
    Message
    2013-12-12 15:23:25.723 Copyright (c) 2008 Microsoft Corporation
    2013-12-12 15:23:25.723 Microsoft SQL Server Replication Agent: distrib
    2013-12-12 15:23:25.723 
    2013-12-12 15:23:25.723 The timestamps prepended to the output lines are expressed in terms of UTC time.
    2013-12-12 15:23:25.723 User-specified agent parameter values:
    -Publisher MyPublisherDistibutor
    -PublisherDB MyDatabase
    -Publication Group12
    -Distributor MyPublisherDistibutor
    -SubscriptionType 1
    -Subscriber MySubscriber
    -SubscriberSecurityMode 1
    -SubscriberDB MyDatabase
    -MaxBCPThreads 2
    -QueryTimeout 5400
    -BcpBatchSize 500
    -HistoryVerboseLevel 3
    -XJOBID 0x9563FCC7EBD3314B8F1265C118B9DED2
    -XJOBNAME REPL DIST - Group12
    -XSTEPID 1
    -XSUBSYSTEM Distribution
    -XSERVER MySubscriber
    -XCMDLINE 0
    -XCancelEventHandle 0000000000000A9C
    -XParentProcessHandle 0000000000000B28
    2013-12-12 15:23:25.723 Startup Delay: 1231 (msecs)
    2013-12-12 15:23:26.956 Connecting to Subscriber 'MySubscriber'
    2013-12-12 15:23:27.143 Connecting to Distributor 'MyPublisherDistibutor'
    2013-12-12 15:23:27.158 Parameter values obtained from agent profile:
    -commitbatchsize 100
    -commitbatchthreshold 1000
    -keepalivemessageinterval 300
    -logintimeout 15
    -maxdeliveredtransactions 0
    -pollinginterval 5000
    -skiperrors 
    -transactionsperhistory 100
    2013-12-12 15:23:27.502 Initializing
    2013-12-12 15:26:00.445 Table 'InstrumentSnapshotValue' passed rowcount (3214955606) validation. Rowcount method 0 used (0 = Full, 1 = Fast).
    2013-12-12 15:26:00.695 101 transaction(s) with 100 command(s) were delivered.
    2013-12-12 15:26:00.897 100 transaction(s) with 100 command(s) were delivered.
    2013-12-12 15:26:00.975 100 transaction(s) with 100 command(s) were delivered.
    2013-12-12 15:26:01.053 100 transaction(s) with 100 command(s) were delivered.
    2013-12-12 15:26:01.194 100 transaction(s) with 100 command(s) were delivered.
    2013-12-12 15:26:01.459 100 transaction(s) with 100 command(s) were delivered.
    2013-12-12 15:26:01.506 100 transaction(s) with 100 command(s) were delivered.
    2013-12-12 15:27:03.391 Delivering replicated transactions
    2013-12-12 15:27:06.652 Agent message code 10054. TCP Provider: An existing connection was forcibly closed by the remote host.
    2013-12-12 15:27:06.699 Category:NULL
    Source: Microsoft SQL Server Native Client 10.0
    Number: 10054
    Message: TCP Provider: An existing connection was forcibly closed by the remote host.
    2013-12-12 15:27:06.699 Category:NULL
    Source: Microsoft SQL Server Native Client 10.0
    Number: 10054
    Message: Communication link failure
    2013-12-12 15:27:06.699 Category:NULL
    Source: Microsoft SQL Server Native Client 10.0
    Number: 10054
    Message: TCP Provider: An existing connection was forcibly closed by the remote host.
    2013-12-12 15:27:06.699 Category:NULL
    Source: Microsoft SQL Server Native Client 10.0
    Number: 10054
    Message: Communication link failure
    2013-12-12 15:27:06.699 Category:NULL
    Source: Microsoft SQL Server Native Client 10.0
    Number: 10054
    Message: TCP Provider: An existing connection was forcibly closed by the remote host.
    2013-12-12 15:27:06.699 The agent failed with a 'Retry' status. Try to run the agent at a later time.
    
    
    
    

    Thursday, December 12, 2013 3:41 PM
  • Can you stop all other distribution agents and see if this agent is able to replication some data, or if it can't replicate any.

    You also need to look at what it is trying to replicate to see if it is replicating something unusual like a raiseerror statement which some spam appliances/firewalls will stop.


    looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941

    Thursday, December 12, 2013 3:47 PM
    Moderator
  • That piques my interest “RAISEERROR” …

    How does one replicate a raiseerror statement?

    Only reason I ask is that the clients do use raiseerror in their processes.

    Where would I look for that possible occurrence?

    Thanks for all the great help so far!

    Thursday, December 12, 2013 3:57 PM
  • Bit more info (may or may not help) ...

    I am looking at all the records (7.7 million) for the 1 article that is in this subscription that I am finding in the MSrepl_commands table and I see that out of the 7.7 million records 48 have a value in the "type" column as 30 where there are 48 that have a value of 1073741859.

    There are no other values besides 30 & 1073741859 in the "type" column of the repl_commands table.

    May I ask the usage of the type column in the MSrepl_commands table AND is their some significance to the odd value of 1073741859?

    Thanks again!

    Thursday, December 12, 2013 5:10 PM
  • I recently had a very similar problem with one of my servers.  It was only a problem during subscription initialization and only for large subscriptions.  The subscription initialization would fail with the same error message:

    TCP Provider: An existing connection was forcibly closed by the remote host.

    I tested it myself by copying a very large file between the two servers and sure enough the copy failed so I knew it wasn't SQL Server or replication, it had something to do with the network.  I asked my network administrator to check the line between the two servers and that came up empty.

    Finally weeks later my network administrator called me up and told me he fixed the problem.  He had assigned the same IP address to both network adapters on the distributor.

    When something would attempt to use the alternate network adapter the connection on the primary network adapter would be closed.

    If you have multiple network adapters on your distributor you should verify that each network adapter has different IP addresses assigned to them.

    Thursday, December 12, 2013 5:21 PM
  • The type values other than 30 reflect id's of schema commands from snapshots or schema change scripts.

    You need to look at these guys to see what they contain and if they contain anything a spam appliance might barf on.


    looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941

    Thursday, December 12, 2013 7:08 PM
    Moderator
  • Thanks!

    Going through 7.7 million transactions for a "needle in the transaction stack" will take some time.

    You have any slick methods to accomplish this task?

    Have a good evening!

    Thursday, December 12, 2013 7:11 PM
  • I think you will need to do something like this:

    select xact_seqno From msrepl_commands  
    where type='1073741859'
    exec sp_browsereplcmds @xact_seqno_start='0x0000003A00000108000A00000001'--replacing the last value returned from above for @xact_seqno


    looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941

    Thursday, December 12, 2013 7:51 PM
    Moderator
  • OK so I am on the same page with you ...

    Am I too run that query against the 7.7 million rows with the type = 30 (based on your comment of "The type values other than 30 reflect id's of schema commands from snapshots or schema change scripts. You need to look at these guys to see what they contain and if they contain anything a spam appliance might barf on.") or am I to run the query with the type = '1073741859' as you have in the code snippette?

    Thanks!

     
    Thursday, December 12, 2013 7:55 PM
  • I thought we had established the error was coming from the type 1073741859 as there are fewer of them - this is where I would start.

    looking for a book on SQL Server 2008 Administration? http://www.amazon.com/Microsoft-Server-2008-Management-Administration/dp/067233044X looking for a book on SQL Server 2008 Full-Text Search? http://www.amazon.com/Pro-Full-Text-Search-Server-2008/dp/1430215941

    Thursday, December 12, 2013 8:00 PM
    Moderator
  • Indeed Sir ... exactly what I am doing now.

    Just double checking ...

    Will report back

    Thursday, December 12, 2013 8:02 PM
  • Indeed Sir ... exactly what I am doing now.

    Just double checking ...

    Will report back

    Hi JJKusch,

    I’m writing to follow up with you on this post. Was the problem resolved after performing our action plan steps? If you are satisfied with our solution, I’d like to mark this issue as "Answered". Please also feel free to unmark the issue, with any new findings or concerns you may have.

    Thanks,
    Sofiya Li


    Sofiya Li
    TechNet Community Support

    Monday, December 23, 2013 2:01 PM
    Moderator
  • I am unable to validate if the information above will solve my issue.

    I am still getting the error at this point in time.

    Problem I ran into was in executing the command Hillary relayed to me :

      exec sp_browsereplcmds @xact_seqno_start='0x0000003A00000108000A00000001'--replacing the last value returned from above for @xact_seqno

    On a server with 64GB of RAM my first attempt to broswe the commands of the first  @xact_seqno_start group ran over 7 hours and finally crapped out with an error of not enough available memory.

    Thus I was never able to see if there was a rouge command in that set.

    If there is another method to get into these commands and check for the rouge record(s) please relay it to me.

    Thanks again and Happy Holidays!

    Tuesday, December 24, 2013 2:55 PM
  • I am facing the same issue here. I have pull transactional replication setup. It works fine when there is less number of transactions following, but when transactions flow load increases for big table it starts failing with below error. Same time there is no network issue or able to telnet the ports both servers are able to connect.

    Let me know if got any solution to resolve the same.

    Message
    2020-06-30 02:52:51.043 Copyright (c) 2016 Microsoft Corporation
    2020-06-30 02:52:51.043 Microsoft SQL Server Replication Agent: distrib
    2020-06-30 02:52:51.043
    2020-06-30 02:52:51.043 The timestamps prepended to the output lines are expressed in terms of UTC time.
    2020-06-30 02:52:51.043 User-specified agent parameter values:
                -Publisher SERVERS1
                -PublisherDB XYZ_DBSchema
                -Publication XYZ_Reporting_REPL
                -Distributor SERVERS2
                -SubscriptionType 1
                -Subscriber SERVERS1
                -SubscriberSecurityMode 1
                -SubscriberDB XYZ_Reporting
                -Continuous
                -XJOBID 0xB5B1C0883E31AB4CBA4638D9DE82779A
                -XJOBNAME SERVERS1-XYZ_DBSche-XYZ_Reporting_-SERVERS1-XYZ_Reporting-0CF91B86-EBE3-4360-98FE-506C20656804
                -XSTEPID 1
                -XSUBSYSTEM Distribution
                -XSERVER SERVERS1
                -XCMDLINE 0
                -XCancelEventHandle 0000000000001DF4
                -XParentProcessHandle 0000000000001AAC
    2020-06-30 02:52:51.043 Startup Delay: 4595 (msecs)
    2020-06-30 02:52:55.652 Connecting to Subscriber 'SERVERS1'
    2020-06-30 02:52:55.668 Connecting to Distributor 'SERVERS2'
    2020-06-30 02:52:56.293 Parameter values obtained from agent profile:
                -bcpbatchsize 2147473647
                -commitbatchsize 100
                -commitbatchthreshold 1000
                -historyverboselevel 1
                -keepalivemessageinterval 300
                -logintimeout 15
                -maxbcpthreads 1
                -maxdeliveredtransactions 0
                -pollinginterval 5000
                -querytimeout 1800
                -skiperrors
                -transactionsperhistory 100
    2020-06-30 02:52:56.605 Initializing
    2020-06-30 02:53:04.199 101 transaction(s) with 3062 command(s) were delivered.
    2020-06-30 02:53:04.652 100 transaction(s) with 194 command(s) were delivered.
    2020-06-30 02:53:04.918 100 transaction(s) with 214 command(s) were delivered.
    2020-06-30 02:53:13.168 Agent message code 10054. TCP Provider: An existing connection was forcibly closed by the remote host.
    2020-06-30 02:53:13.308 Category:NULL
    Source:  Microsoft SQL Server Native Client 11.0
    Number:  10054
    Message: TCP Provider: An existing connection was forcibly closed by the remote host.
    2020-06-30 02:53:13.340 Category:NULL
    Source:  Microsoft SQL Server Native Client 11.0
    Number:  10054
    Message: Communication link failure
    2020-06-30 02:53:13.371 Category:NULL
    Source:  Microsoft SQL Server Native Client 11.0
    Number:  10054
    Message: TCP Provider: An existing connection was forcibly closed by the remote host.
    2020-06-30 02:53:13.402 Category:NULL
    Source:  Microsoft SQL Server Native Client 11.0
    Number:  10054
    Message: Communication link failure
    2020-06-30 02:53:13.434 Category:NULL
    Source:  Microsoft SQL Server Native Client 11.0
    Number:  10054
    Message: TCP Provider: An existing connection was forcibly closed by the remote host.
    2020-06-30 02:53:13.480 The agent failed with a 'Retry' status. Try to run the agent at a later time.


    Tuesday, June 30, 2020 4:49 AM