none
Disadvantages of using 'Allow Failback' in sql cluster

    Question

  • Hi, we are looking into 'Allow failback' option in sql cluster as we would like sql to run on its preferred owner after patching is done. Patching is done by a different team and they don't put sql back to its original node most of the times - they have some issue currently. I would rather not touch the cluster but we are looking into the option of failback anyhow.

    Prevent failback is our current default - which works well.

    I wanted to know what are the disadvantages of using 'Allow Failback- Immediately' option in sql cluster, Thanks.

    One scenario that i can think of is during a actual issue on the current node, sql fails over to the other good node, but then tries to fallback to the 1st bad node, which fails it to 2nd good node - How long does this cycle runs for and does SQL ends up in a failed state?


    D

    • Moved by Tom Phillips Tuesday, June 11, 2019 11:51 AM HA question
    Monday, June 10, 2019 4:55 PM

All replies

  • Hi SQLRocker,

    In general it is not recommended to enable failback, unless you can schedule it for a time where you know that the server is unused. Otherwise, you're creating a service outage, perhaps unnecessarily.

    The preferred owner setting is a preference for a failover (not failback), and has no significance for a two-node cluster. This is something that is commonly misunderstood.

    If the nodes are rebooted, you should manually re-home the cluster resources, unless you can identify idle-time as stated earlier.

    Best regards,
    Cathy Ji

    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to  MSDN Support, feel free to contact MSDNFSF@microsoft.com

    Tuesday, June 11, 2019 6:35 AM
  • I would agree do not touch it it adds more complexity. For a team which is doing this ask them to add a task in patching plan to make sure cluster resource is no node which it was before patching. A simple rule would help you a lot

    Cheers,

    Shashank

    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it

    My TechNet Wiki Articles

    MVP

    Tuesday, June 11, 2019 8:12 AM
    Moderator
  • yes, Cathy - we are looking at more than 2 node clusters , as i mentioned I would also rather not touch the cluster but i am looking to present the disadvantages of failback option. A unnecessarily service outage is one  - But it will happen during a maintenance window & that will anyhow occur as we need to eventually put sql at its right node. 

    Shashank - Doesn't look like the patching team has a solution currently.

    So, What else are the disadvantages that i can present(i suggested a scenario before in my post). 


    D

    Tuesday, June 11, 2019 10:00 AM

  • Shashank - Doesn't look like the patching team has a solution currently.

    So, What else are the disadvantages that i can present(i suggested a scenario before in my post). 


    D

    Sorry but that is plain excuse, there is no rocket science that needs to be done, just a node where cluster resources should be. You are only complicating things. One disadvantage is in case failover happens due to some issue ( not during patching but any regular day) and you have kept failback the cluster would again failback giving "extra" downtime because of that configured policy. Does this qualify as strong point ?

    Cheers,

    Shashank

    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it

    My TechNet Wiki Articles

    MVP


    Tuesday, June 11, 2019 10:09 AM
    Moderator
  • Sounds like you think i like this option, but on the contrary if you read my 1st post you will see that I opened this question so that i can present disadvantages, so that i can propose to stop it to be used.

    --

    Yes, its a valid point - actual non-patching failover - causes another failback - so downtime... that downtime will eventually happen tho, but yeah during business hours its a issue - so is a valid point.

    Going more into it, i mentioned another scenario initially , If preferred node has gone bad, what if sql tries to go back to the bad node but can't , and then fails back to non-preferred node but tries again to go back.. My question in that case is how many tries will sql do & can it end up in a  failed state?


    D

    Tuesday, June 11, 2019 10:21 AM

  • Going more into it, i mentioned another scenario initially , If preferred node has gone bad, what if sql tries to go back to the bad node but can't , and then fails back to non-preferred node but tries again to go back.. My question in that case is how many tries will sql do & can it end up in a  failed state?

    That is a good point and you must mention that if cluster tries to failback on node which is not online you may see "ping pong game" going on resource trying to move forward and back.  I guess there is option in cluster GUI to limit amount of failback. See this question the OP has posted the screenshot that would control it

    Cheers,

    Shashank

    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it

    My TechNet Wiki Articles

    MVP

    Tuesday, June 11, 2019 10:35 AM
    Moderator
  • The link you posted is for different issue (preferred owner).

    Allow failback has only 2 options, 'Immediately' & 'Failback between'.


    D

    Tuesday, June 11, 2019 11:09 AM
  • The link you posted is for different issue (preferred owner).

    Allow failback has only 2 options, 'Immediately' & 'Failback between'.


    D

    You sure did not see question completely on General Tab you have preferred owner setting on same pop up on other tab failover you have allow failback radio button that is what is shown in the stackexchange thread.

    If failback is selected immediately it will try to failback and if cannot it will again try to come online on previous node and if that fails it can again try failing backup this can go on, I do not have value after which it will stop because I have not seen this scenario but this may continue until one node comes online.

    if you select failback between certain period it will keep on trying when that frequency is met


    Cheers,

    Shashank

    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it

    My TechNet Wiki Articles

    MVP

    Tuesday, June 11, 2019 3:08 PM
    Moderator
  • Thats what i want to know- From my 1st post - "How long does this cycle runs for and does SQL ends up in a failed state?"

    D

    Tuesday, June 11, 2019 6:45 PM
  • Taking your example above, we will only be setting one prefered owner, say 1, After patching if sql comes online at 3, the 'failback immediate' setting will take it to 1. But in case of any issues on 1 -> this is where i needed some concrete doc & advise - Are you saying that sql will only try ONCE going to the preferred owner and then will come online on wherever it can come online on (2/3) and stay online there?

    D


    • Edited by SQLRocker Wednesday, June 12, 2019 11:44 AM
    Wednesday, June 12, 2019 11:44 AM
  •  this is where i needed some concrete doc & advise - Are you saying that sql will only try ONCE going to the preferred owner and then will come online on wherever it can come online on (2/3) and stay online there
    Let me setup a WSFC and try few scenarios I will get back to you.

    Cheers,

    Shashank

    Please mark this reply as answer if it solved your issue or vote as helpful if it helped so that other forum members can benefit from it

    My TechNet Wiki Articles

    MVP

    Wednesday, June 12, 2019 1:31 PM
    Moderator
  • Looks like that depends on the 'Failover' option which is just above the 'Failback' option (as you mentioned earlier on the link that you gave)

    It has 2 values:

    Max failures in the specified period: 

    Period (hours):

    Say, above are set to 5 & 6 respectively , so i think sql can only do a combined (failover&failback) 5 times in 6 hrs and then it will be left in a failed state.

    Do during a actual problem, sql will ping-pong a total of 5 times & then will be left on a failed state if taking a example of 5 & 6 above.


    D

    Thursday, June 20, 2019 8:41 PM
  • So, for EVERY problem on the preferred node -> sql WILL end up on a failed state, thus negating the USE of a cluster.

    ---

    is that a fair statement? or am i missing something.

    Say, on a 2 node cluster, preferred node 1 has a issue, sql1 failover to 2 & if failback is set to immediate , it will fallback to problem1 , doing this say 5 times in the example above & as the failovers happen within seconds sql will pretty soon be in failed state.


    D

    Thursday, June 20, 2019 9:39 PM
  • The preferred owner setting is a preference for a failover (not failback), and has no significance for a two-node cluster. This is something that is commonly misunderstood.

    Hi Cathy,

    i am a bit confused by the above, So, why does preferred owner setting has no significance on 2 node cluster? 

    And why is preferred owner setting not for failback? 

    On a 2 node cluster, if 2 sql instances are installed say sql1 & sql2, we can assign node1 as pref node for sql1 & node2 for sql2..? And have failback immediate, so that sql1 tries to automatically failback when node1 comes back online?


    D

    Friday, June 21, 2019 12:46 AM
  • https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/cc755151%28v%3dws.10%29

    Above gives exact info in example 1 ,so i don't know Cathy why you would make the above statement?

    also, https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/cc771809(v=ws.11)

    says "You must configure a preferred owner if you want failback to occur (that is, if you want a particular service or application to fail back to a particular node when possible)."

    so not sure about your statements? 


    D

    Friday, June 21, 2019 1:12 PM