none
Odd Number SQL AlwaysOn Cluster in a Dual Data Center

    Question

  • Hi Everyone,

    I am reviewing a proposed SQL 2012 AlwaysOn Logical Design from an application vendor and I doubts with regards to its resiliency in a DR scenario.  The design calls for 3 x AlwaysOn Cluster Nodes--with Node 1 and Node 2 in the primary data center and Node 3 in the secondary data center.
    Failover process from Node 1 and Node 2 is automatic, failover from primary data center to secondary is manual.  My concerns are as follows:

    1. It is my understanding that if Primary Data Center is offline, due to a disaster, quorum would not be possible as the secondary data center only has 1 out of 3 possible votes.  Just want someone else to confirm this point.
    2. To get the 3rd node activated in this config we would need to remove node 1 and 2 from WSFC and execute a forced quorum?
    3. Is a 3 node SQL AlwaysOn cluster in a two data center environment approach advisable? The explicit requirement is to have automatic failover at the primary and manual failover to DR site as required.  Taking from my experience with Exchange DAG, the recommendation is to stick with even numbers and a witness.

    Jeff

    Monday, December 30, 2013 3:17 AM

Answers

  • Hello Jeff,

    1. You're correct, though you could set node 3 to have 0 votes if you wanted but I don't see why you'd want to do that. You're probably going to choose Node Majority Set for quorum since it's an odd number and that's what makes sense.

    2. You would follow the standard procedure for forcing quorum. Think about it this way, if you have three nodes like that and your multiple different ISPs all suffer a failure at the same time how would that node know that the resources really aren't down and cause a split brain situation? Thus it makes sense in the scenario.

    3. It's perfectly fine, it meets your needs for now. I'm not sure why you'd need even numbers and a another voting node, that would make it more complicated. even if you had even numbers of nodes and a fileshare voter in the scenario I have above the situation would still result in the same (5 votes total, 3 votes for primary and 2 for DR so no difference as DR would still be less than 50%).


    Sean Gallardy | Blog | Twitter

    Monday, December 30, 2013 4:18 AM
  • "When the Quorum is for Majority node set cluster, how can the cluster and the availability group be online when 2 of the 3 node cluster are down due to disaster?"

    You force quorum.  See:

    Perform a Forced Manual Failover of an Availability Group (SQL Server)

    David


    David http://blogs.msdn.com/b/dbrowne/


    Thursday, January 02, 2014 8:35 PM

All replies

  • Hello Jeff,

    1. You're correct, though you could set node 3 to have 0 votes if you wanted but I don't see why you'd want to do that. You're probably going to choose Node Majority Set for quorum since it's an odd number and that's what makes sense.

    2. You would follow the standard procedure for forcing quorum. Think about it this way, if you have three nodes like that and your multiple different ISPs all suffer a failure at the same time how would that node know that the resources really aren't down and cause a split brain situation? Thus it makes sense in the scenario.

    3. It's perfectly fine, it meets your needs for now. I'm not sure why you'd need even numbers and a another voting node, that would make it more complicated. even if you had even numbers of nodes and a fileshare voter in the scenario I have above the situation would still result in the same (5 votes total, 3 votes for primary and 2 for DR so no difference as DR would still be less than 50%).


    Sean Gallardy | Blog | Twitter

    Monday, December 30, 2013 4:18 AM
  • Hi Jeff,

    Your first point is absolutely correct. In the above mentioned design if the intention is to use the third Node as the DR node, then in case of disaster at the Primary data center there is no other option than to bring the DR database up and online manually as the entire cluster and the AG(availability group) will not be available. Hence in case of disaster at Primary DC the application will have to point to the Physical DR server and not the AG Listner.

    As mentioned above as the cluster won't exists at all (will be down) in case of disaster at Primary DC hence there is no need to remove Node1 and Node2 from the cluster.

    The above design is absolutely fine as long as your business requirements are getting full filled. There is no requirement to have even number of Nodes.

    If you want the cluster to be online even where there is Disaster at the Primary DC or at the Secondary DC then you will require a total of 6 Nodes. 3 at the Primary DC and 3 as the DR DC as the Quorum for this setup will be Majority Node Set Quorum.



    Thanks and Regards Alankar Chakravorty MCITP Database Administrator SQL Server 2008 MCITP Database Administrator SQL Server 2005

    Monday, December 30, 2013 9:01 AM
  • Alankar,

    In response to this:

    "Hence in case of disaster at Primary DC the application will have to point to the Physical DR server and not the AG Listener"

    This is a huge problem, as I only a 15 minute RPO once a disaster is declared and there are potentially hundreds of servers referencing that AG Listener.  Again sounds like I'm better off with a 4 node cluster as I can still achieve quorum by rehoming the witness. Although the failover is manual, I want the activation to be as hands free as possible.  Would SQL Alias be another alternative?

    Jeff

    Tuesday, December 31, 2013 7:29 PM
  • Hi Jeff,

    If there are hundreds of servers which are pointing to the AG listener then it will definitely be a leangty process to point to the physical DR server.

    Are these hundreds of Application servers residing at the Primary DC itself?


    Thanks and Regards Alankar Chakravorty MCITP Database Administrator SQL Server 2008 MCITP Database Administrator SQL Server 2005

    Tuesday, December 31, 2013 9:44 PM
  • Yes.  We are using VMware SRM to failover the application servers to DR site.  I want little to no changes on the application side.
    Thursday, January 02, 2014 5:34 PM
  • This:

    "Hence in case of disaster at Primary DC the application will have to point to the Physical DR server and not the AG Listener"

    Is not correct.  Once you force quorum and bring the cluster online on the DR node, the AG Listener will come online and register its IP addresses in DNS.  In the typical config, one of its IP addresses is on the DR subnet and the clients will all know both IP addresses.

    David


    David http://blogs.msdn.com/b/dbrowne/

    Thursday, January 02, 2014 6:40 PM
  • Hi David,

    When the Quorum is for Majority node set cluster, how can the cluster and the availability group be online when 2 of the 3 node cluster are down due to disaster?


    Thanks and Regards Alankar Chakravorty MCITP Database Administrator SQL Server 2008 MCITP Database Administrator SQL Server 2005

    Thursday, January 02, 2014 8:28 PM
  • "When the Quorum is for Majority node set cluster, how can the cluster and the availability group be online when 2 of the 3 node cluster are down due to disaster?"

    You force quorum.  See:

    Perform a Forced Manual Failover of an Availability Group (SQL Server)

    David


    David http://blogs.msdn.com/b/dbrowne/


    Thursday, January 02, 2014 8:35 PM