locked
Client failover takes 1 to 3 minutes after database has failed over RRS feed

  • Question

  •  

    We are having problems getting the ADO.NET client to failover in a timely manner in a web application.  Here's what happens:

     

    1)  Database fails over in 3 to 5 seconds

    2)  Clients attempt to login to primary and receive SQL server not available errors

    3)  The clients slowly start to connect and succeed without errorson the secondary. 

    4) We are still see SQL Server login errors on the primary, even after the failover

    5)  When we failback to the primary, it all happens within a few seconds (no problems)

     

    If we recycle IIS, this problem goes away

     

    Here's what we have tried

    1)  Turning off connection pooling (has no effect).  We also can see the connections get killed on the primary server

    2)  Applying this hotfix (http://support.microsoft.com/kb/944099)

     

    And yes, we are running a script to sync up the SIDS so that doesn't appear to be a problem.  What seems to be happening is the primary server is getting stuck in the ADO.NET application memory.  The problem is we don't know how to fix it.

     

    BTW, our hardware is SQL Server 2005, Windows Server 2003, .NET 2.0 (and .NET 3.5)


    Let me know if you need additional information.  Thanks in advance

    Monday, May 12, 2008 1:53 PM

All replies

  • Let me qualify my answer by saying I have never actually written a client that uses a connection string with a failover partner component.  Because of that, I'm not sure whether the developer has to embed any "retry" logic to use the failover partner or whether by including the failover partner component, the retry is automatic.  My understanding is that the retry is automatic but presumably, the original connection error is logged somewhere.  You said that your clients were receiving server not available errors but you didn't say whether the apps were failing or whether these errors were simply logged.

    With that in mind, it seems to me that it's working exactly the way it's supposed to.  Presumably, your connection strings are all coded to point to the original principal server.  The fact that it failed over doesn't change that.

    Using a little shorthand here, if your connection string says: server=AAA;failover=BBB

    The fact that you've failed over and made BBB the new principal isn't going to change the fact that each new connection is going to try and connect to AAA first.  Since an unused connection will eventually timeout, you should continue to see "server not available" errors for every first attempt.  By the same token, when you failback to the original principal everything should be golden since your connection strings are already coded for that server.

    There is a retry delay to consider per this BOL entry.

    ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/5083089e-0eb2-408c-b1ac-f3de12f6b92a.htm

    This entry also explains the initial connection and has some useful info in it.

    ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/fd000cb1-9426-4ed9-8108-1907ddfd1c5f.htm
    Monday, May 12, 2008 7:33 PM
  • Bob,


    Based on this article it appears that ADO.NET should store the new partner name in memory and not try the other partner again.

     

    "Assume a client is successfully connected to the principal, and a database mirroring failover (automatic, manual, or forced) occurs. The next time the application  attempts to use the connection, the ADO.NET or SQL Native Client driver will detect that the connection to the old principal has failed, and will automatically retry   connecting to the new principal as specified in the failover partner name. If successful, and there is a new mirror server specified for the database mirroring session  by the new principal, the driver will retrieve the new partner failover server name and place it in its client cache"

     

    http://www.microsoft.com/technet/prodtechnol/sql/2005/dbmirror.mspx

     

    Also from MSDN

     

    "The client always tries to connect to the current principal server. If it fails, it tries the failover partner. If the mirror database has already been switched to the principal role on the partner server, the connection succeeds and the new principal-mirror mapping is sent to the client and cached for the lifetime of the calling AppDomain. It is not stored in persistent storage and is not available for subsequent connections in a different AppDomain or process. However, it is available for subsequent connections within the same AppDomain.

     

    http://msdn.microsoft.com/en-us/library/5h52hef8(VS.80).aspx

     

    Thanks for your response.  Let me know what you think of this new information

     

    Monday, May 12, 2008 8:09 PM