locked
Automatic failover doesn't failback to the first server if the second server is lost. RRS feed

  • Question

  • Hi Everybody,

       We use the database mirroring a lot in our product solutions and we have recently experienced a strange behaviour in our failover tests with SQL2008R2.

    We have 2 servers running Windows 2008 R2 standard and SQL 2008 R2 standard SP2. (let's call them DB1 and DB2)

    We also have a Witness workstation running SQL 2008 Express on a Windows 7

    A database from DB1 is mirrored to DB2 in "safety full" mode, with witness. At this stage, the database is principal on DB1 and mirror on DB2

    To test the automatic failover, we first restart the DB1 server which has the database in principal mode

    After a few seconds, the database on DB2 becomes principal, which is normal , that's exactly what we want.

    After a few minutes, DB1 comes back online and its database takes the mirror role (still OK). At this stage then, the database is principal on DB2 and mirror on DB1

    when the monitoring application shows that the mirror is synchronized and that both servers are connected to the witness, we restart DB2 to trigger an automatic failover to DB1.

    What we see is that DB1 never takes the principal role and the database stays in mirror.

    In the DB1 Errorlog, I only see these 2 lines when DB2 disappears, no other message related to the mirroring session.

    2014-01-22 08:57:26.91 spid43s     Starting up database 'Test123'.
    2014-01-22 08:57:26.95 spid43s     Bypassing recovery for database 'Test123' because it is marked as a mirror database, which cannot be recovered. This is an informational message only. No user action is required.

    When DB2 comes back online, the database on DB2 keeps its principal status and the database on DB1 stays mirror.

    And what is really really strange is that, if I restart DB2 once again, directly after that, DB1 failover normally and the database on DB1 takes the principal role after a few seconds. without any configuration changes between the 2 restarts.

    DB1 errorlog shows then :

    2014-01-22 09:00:37.53 spid29s     Error: 1474, Severity: 16, State: 1.
    2014-01-22 09:00:37.53 spid29s     Database mirroring connection error 4 'An error occurred while receiving data: '64(The specified network name is no longer available.)'.' for 'TCP://DB2:5022'.
    2014-01-22 09:00:37.53 spid18s     Database mirroring is inactive for database 'Test123'. This is an informational message only. No user action is required.
    2014-01-22 09:00:42.37 spid32s     The mirrored database "Test123" is changing roles from "MIRROR" to "PRINCIPAL" due to Auto Failover.
    2014-01-22 09:00:42.39 spid32s     Recovery is writing a checkpoint in database 'Test123' (7). This is an informational message only. No user action is required.
    2014-01-22 09:00:42.39 spid32s     Recovery completed for database Test123 (database ID 7) in 78 second(s) (analysis 0 ms, redo 0 ms, undo 7 ms.) This is an informational message only. No user action is required.

    So, if I summarize, 

    - a first failover from DB1 to DB2 always work

    - then, a restart of DB2 never failover to DB1

    - a second restart of DB2 always failover to DB1

    This is pretty much systematic on one our server couple.

    Any explanation for this or any idea where I can search to find the reason of this strange behavior ?

    Thanks a lot for your help

    Seb

    Friday, January 24, 2014 4:25 PM

All replies

  • The default timeout is 10 seconds, unless you have changed it to a different value.  If DB2 is not down for greater than the timeout, it will not detect the outage and failover.

    See:

    http://www.mssqltips.com/sqlservertip/1603/adjusting-the-automatic-failover-time-for-sql-server-database-mirroring/

    Friday, January 24, 2014 8:09 PM
  • Hi Tom

    That make sense but why it does fail over when it restated again? Does it count the time out performed for the first time?


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence

    Sunday, January 26, 2014 9:54 AM
  • Thanks for your reply Tom

    But in this case, DB2 is down for several minutes without any reaction of DB1.

    And the failover at the second restart is quite fast.

    So, I'm still wo,dering why DB1 "refuse" to become principal at the first restart without any significant error message in the errorlog

    Seb

    Monday, January 27, 2014 8:33 AM
  • Hi All,

    Still no other idea to help me with this issue ?

    Do you need special logs or info ?

    Thanks

    Seb

    Wednesday, February 5, 2014 4:11 PM
  • Look at the SQL errorlogs on both the mirror and principal, it should show why there is a problem.

    Wednesday, February 5, 2014 5:25 PM
  • Thank you Tom

    But I have already checked that and reported the Errorlog abstracts in my original post.

    When DB01 disapears for the first time, nothing in the DB01 ERRORLOG (it is restarting :-) )

    AND no particular error message in the DB02 ERRORLOG (nothing related to the fact that DB01 is not reachable anymore !!! )

    Only these two lines

    2014-01-22 08:57:26.91 spid43s     Starting up database 'Test123'.
    2014-01-22 08:57:26.95 spid43s     Bypassing recovery for database 'Test123' because it is marked as a mirror database, which cannot be recovered. This is an informational message only. No user action is required.

    So my main question remains Why DB02 doesn't detect that DB01 disapears (and the first time only) and why the failover mechanism doesn't trigger the failover ?

    Thank you

    Seb

    Thursday, February 6, 2014 8:21 AM
  • DB Mirroring does not support automatic failback. This is by design. You will need to initiate a failback manually for the old primary (which is a mirror now) to assume the primary again.

    -Feroz


    Mark as Answer if it helps. This posting is provided "AS IS" with no warranties and confers no rights.


    • Edited by Feroz R Tuesday, March 18, 2014 11:24 AM
    Tuesday, March 18, 2014 11:22 AM
  • Hi Feroz,

    There is a little bit of misunderstanding here, I'm afraid.

    We don't expect a failback, I know that there is no failback possible.

    Here we see that a failover is not triggered even if the principal server is lost.

    Thanks for your contribution.

    Seb

    Tuesday, March 18, 2014 11:27 AM
  • You had mentioned that there is always behavior, how many times you have tested these strange things?

    Have you noticed any packet drop issues between such window among servers like all three servers.


    Santosh Singh

    Thursday, June 12, 2014 1:04 PM