locked
SQL AlwaysOn Networking Questions RRS feed

  • Question

  • I'm asking these questions together so they better describe what I am trying to understand overall.

    We have two next major goals for our application and data center environment: 1) implement hot/warm (Active/Passive) failover for DR and 2) implement hot/hot (Active/Active) for HA after my service application is enhanced to sync over the network in order to scale out. I'm mostly interested in scenario #1 now where the failover data center is keeping a DR replica of our primary database that we want to use R/O for reporting, invoicing etc. in the meantime when not failed over to. This is a great AlwaysOn feature.

    We are Microsoft centric (Server 2008 R2 and SQL 2012 Enterprise) so I hope to use SQL AlwaysOn but am confused on how a few networking mechanisms work in practice and want to better understand the SQL AlwaysOn big picture.

    I'm a full time developer and also do most of the IT and understand a fair bit of networking (for a developer) but don't configure much since we are running at an ISP where our IAAS is hosted under VMware and they mostly manage the network. Our long term goal is a multi data center, multi vendor Active/Active geo diverse configuration. We will eventually be converting to Windows Server 2012 or can sooner if required or there is some useful benefit in conjunction with AlwaysOn.

    Here are my questions which mostly are asking about how certain IP and Windows Failover Cluster mechanisms work in practice with AlwaysOn.

    • Do I use DNS to map a VNN to some IP(s) or VIP that all the machines in the Availability Group listen to? How are DNS, VNN, IP(s), VIP, etc. used to make this happen?

    • After a failover, how does a client find the new machine? Is the connection broken and the client application must have retry logic, or be restarted in some poor design cases, to reconnect? I think a VNN is used here and if so would like to understand how that works before, during and after the primary machine fails.

    • Can we setup a Windows Server Failover Cluster on Windows Workgroups or must we convert all the involved OS and SQL machines to a domain based system under Active Directory? Does the DC have to be its own machine? We are a very small operation with a high real time transaction rate so setting up AD would be a a big hit on my time as I've worked in that environs before and thus far we have run with workgroups just fine.

    • How is the R/O replica database accessed before and after failover? Can it be accessed after failover and did it just become no longer R/O to applications that expected it to be? I could see someone coding a R/W test to see if they had the correct database! Iow, can/how do I specifically access/address the replica database machine for guaranteed R/O reports rather than how a R/W client might typically connect to the database not caring if they got the primary or the failover database as long as it was R/W and live?

    Thanks so much, Dave


    Dave

    Sunday, August 31, 2014 11:23 PM

Answers

  • Hello Dave,

    We will eventually be converting to Windows Server 2012 or can sooner if required or there is some useful benefit in conjunction with AlwaysOn.

    Windows Server 2012R2 brings alone dynamic quorum, among other things, and is worth going to IMHO.

    Do I use DNS to map a VNN to some IP(s) or VIP that all the machines in the Availability Group listen to? How are DNS, VNN, IP(s), VIP, etc. used to make this happen?

    An availability group listener would be created which acts as the VNN or place of access. This is a clustered resources and fails with the resource group. This is updated behind the scenes through DNS, and there should be nothing that needs to be manually completed (other than maybe staging if you don't have the correct privs in the domain).

    After a failover, how does a client find the new machine? Is the connection broken and the client application must have retry logic, or be restarted in some poor design cases, to reconnect? I think a VNN is used here and if so would like to understand how that works before, during and after the primary machine fails.

    Depends on the kind of failover, but the listener would still be the main point of contact for the application. The listener will automatically be pointing to the primary AG. If something were to go wrong and the instance goes down or is forced failed, then yes client connections will be broken. The application should have retry logic, or at the very basic allow the user to attempt the operation again. The listener should be used here (as I have stated before) which should not require any other application changes.

    Can we setup a Windows Server Failover Cluster on Windows Workgroups or must we convert all the involved OS and SQL machines to a domain based system under Active Directory? Does the DC have to be its own machine? We are a very small operation with a high real time transaction rate so setting up AD would be a a big hit on my time as I've worked in that environs before and thus far we have run with workgroups just fine.

    It must be a domain and will not work with workgroups. AD is required here. The DC should be its own machine and there should be multiple DCs to deal with one going down. The last thing you want is your entire operation going down because a single DC went down or got overwhelmed.

    How is the R/O replica database accessed before and after failover? Can it be accessed after failover and did it just become no longer R/O to applications that expected it to be? I could see someone coding a R/W test to see if they had the correct database! Iow, can/how do I specifically access/address the replica database machine for guaranteed R/O reports rather than how a R/W client might typically connect to the database not caring if they got the primary or the failover database as long as it was R/W and live?

    The first question is it depends. It can be accessed by using the connection specific keywords after it is setup correctly for availability groups. It depends on the setup to how the access would work. It can be accessed after a failover, depending (again) on your infrastructure. If your R/O replica becomes the primary and you have no other replica then it's not a R/O anymore and thus all of your workload would be on the primary... so it depends.

    Other than checking to see if the application is on the primary by using the built in sys.fn_is_primary_replica() there really isn't any way to specifically know when using the listener. In most cases it shouldn't be a problem if the infrastructure is designed correctly, but you could build a test around that function. I don't see why this would even be needed, but it could be done.


    Sean Gallardy | Blog | Microsoft Certified Master

    • Proposed as answer by Sofiya Li Monday, September 1, 2014 7:11 AM
    • Marked as answer by Sofiya Li Friday, September 5, 2014 6:58 AM
    Monday, September 1, 2014 3:17 AM
    Answerer

All replies

  • Hello Dave,

    We will eventually be converting to Windows Server 2012 or can sooner if required or there is some useful benefit in conjunction with AlwaysOn.

    Windows Server 2012R2 brings alone dynamic quorum, among other things, and is worth going to IMHO.

    Do I use DNS to map a VNN to some IP(s) or VIP that all the machines in the Availability Group listen to? How are DNS, VNN, IP(s), VIP, etc. used to make this happen?

    An availability group listener would be created which acts as the VNN or place of access. This is a clustered resources and fails with the resource group. This is updated behind the scenes through DNS, and there should be nothing that needs to be manually completed (other than maybe staging if you don't have the correct privs in the domain).

    After a failover, how does a client find the new machine? Is the connection broken and the client application must have retry logic, or be restarted in some poor design cases, to reconnect? I think a VNN is used here and if so would like to understand how that works before, during and after the primary machine fails.

    Depends on the kind of failover, but the listener would still be the main point of contact for the application. The listener will automatically be pointing to the primary AG. If something were to go wrong and the instance goes down or is forced failed, then yes client connections will be broken. The application should have retry logic, or at the very basic allow the user to attempt the operation again. The listener should be used here (as I have stated before) which should not require any other application changes.

    Can we setup a Windows Server Failover Cluster on Windows Workgroups or must we convert all the involved OS and SQL machines to a domain based system under Active Directory? Does the DC have to be its own machine? We are a very small operation with a high real time transaction rate so setting up AD would be a a big hit on my time as I've worked in that environs before and thus far we have run with workgroups just fine.

    It must be a domain and will not work with workgroups. AD is required here. The DC should be its own machine and there should be multiple DCs to deal with one going down. The last thing you want is your entire operation going down because a single DC went down or got overwhelmed.

    How is the R/O replica database accessed before and after failover? Can it be accessed after failover and did it just become no longer R/O to applications that expected it to be? I could see someone coding a R/W test to see if they had the correct database! Iow, can/how do I specifically access/address the replica database machine for guaranteed R/O reports rather than how a R/W client might typically connect to the database not caring if they got the primary or the failover database as long as it was R/W and live?

    The first question is it depends. It can be accessed by using the connection specific keywords after it is setup correctly for availability groups. It depends on the setup to how the access would work. It can be accessed after a failover, depending (again) on your infrastructure. If your R/O replica becomes the primary and you have no other replica then it's not a R/O anymore and thus all of your workload would be on the primary... so it depends.

    Other than checking to see if the application is on the primary by using the built in sys.fn_is_primary_replica() there really isn't any way to specifically know when using the listener. In most cases it shouldn't be a problem if the infrastructure is designed correctly, but you could build a test around that function. I don't see why this would even be needed, but it could be done.


    Sean Gallardy | Blog | Microsoft Certified Master

    • Proposed as answer by Sofiya Li Monday, September 1, 2014 7:11 AM
    • Marked as answer by Sofiya Li Friday, September 5, 2014 6:58 AM
    Monday, September 1, 2014 3:17 AM
    Answerer
  • Thanks so much Sean. Sorry for the late response. Your info helps me a lot to understand what is going on. I'm still especially puzzled how the failover works DNS wise. Your comment "This is updated behind the scenes through DNS" makes me wonder how that can happen in a timely fashion as DNS changes can take a long time to propagate. Dave

    Dave

    Friday, September 5, 2014 1:37 PM