To setup an alert for balancing sql cluster nodes in the event of a failover

Answered To setup an alert for balancing sql cluster nodes in the event of a failover

  • Thursday, January 24, 2013 8:55 AM
     
     
    Here's what I am really looking for...

    We have a 2 node cluster setup in our environment, and in the recent times we had to face quite a lot of trouble as all the resources on suppose node a fails over and sits there on node b, so for e.g: if I am having 5 instances running one node a, and 5 on node b, if for some reason (may be a n/w hiccup), quorum is lost and the node gets evicted (in this case node a), and hence all the resources/instances from node a flips on to node b, and remains there causing excessive load on the server. 

    Ok now to address our main concern. The problem here is we don't want to overburden the server with all the resources junked up on one node, but instead we want the nodes to be balanced for e.g: 5 instances on node a & 5 instances on node b. It doesn't matter who is the owner of these instances, it could be node a or node b, but at least the nodes should be balanced. 

    Now, my concern is on setting up an alert on this to mail me, only when certain criteria is met (for e.g: if there are more than 5 instances on node b, I should get an alert to re balance the nodes, and vice versa). So, basically what I meant is that I need to setup an alert on SQL Cluster which will send me an alert if & only if I meet a certain condition wherein any of the node seems to be out of balance.

    I have a vague idea for this...something to start of with.

    The condition probably should be

    nSQLNode > round(nSQL/nNode)

    where nNode = total # of nodes in a windows cluster (in our case = 2)
    nSQL = total # of sql instances in the cluster (depends)
    nSQLNode = # of SQL instances on any of the node (for e.g: node a/node b)
    round() is a math function to round the decimal number.

    I have laid the foundation to build this, but I would really appreciate if you guys can put around your coding skills, and get this thing working. It would be really helpful for all of us. I hope to see either a script that you might already been using in your environment or either pointing out to something from where we can build things further (like some kind of a blog or generalized script which can be customized to suit our requirements). Any kind of help and assistance would be highly appreciated.

    You can always mail me on my e-mail id faisalfarouqi@live.com, if you found any useful scripts for setting this up, it can be t-sql (priority on top), powershell or any other thing but it should work to get the alert going...if and only if the condition is met.

    Thanks, 
    Faisal

All Replies

  • Thursday, January 24, 2013 1:58 PM
    Answerer
     
     

    Hello,

    A few things before I get into this...

    If you have tow nodes, and losing one node causes unacceptable performance drops then the next logical step would be to ask for a 3rd node and then set possible owners of resources appropriately. If node 'b' goes down for an extended amount of time, there would be no possible recourse but to have degraded performance until the node is repaired. If going to a single node is your DR worst case scenario for this group then making sure that a single node can handle the workload should be paramount. This directive, of course, signed off by business partners.

    Now to the question at hand.

    While this *could* be done through T-SQL, it would not (in my opinion) be as reliable and would require almost constant upkeep. In a cluster, as you already know, SQL Server instances have their own resource group and the node layer is abstracted. It is certainly possible to know which node each is running on, but you'd have to connect to each instance to find out using the SERVERPEROPERTY function. The script would have to be continually updated to take into account new nodes and instances. It would also fail should an instance not come up properly after a failover, I wouldn't go with this approach.

    Since a node is the "lowest" layer in the cluster topology I would interrogate each node and ask how many resource groups with a SQL Server resource inside of them exist. There is a lovely module for clustering: http://technet.microsoft.com/en-us/library/ee619762(v=WS.10).aspx

    The only issue with using powershell is that you'll need to be an administrator of each node in the group in order to get the information back which is a security concern if you're a DBA (as in most windows admins don't want DBAs having that much power :queue evil laugh:). This method would have the most flexibility as if nodes would be added or removed, nothing would need to change as you'll be grabbing each node in the cluster through the cluster cmdlets.

    If the script was written you have a few possibilities for execution. The first would be to have it run as part of your monitoring solution software if one is available, upon SQL Server Agent startup, some type of scehduled task (though this server/pc would need to run 24/7 to not lose coverage), or on demand.

    -Sean


    Sean Gallardy | Blog | Twitter

  • Thursday, January 24, 2013 3:11 PM
     
     

    Thanks. Sean,

    But this kinda not solving the purpose for me. I hope you could've given me some starters in the form of a script, so that something can be built upon. I definitely agree that running a sql job won't be of good effect, as suppose if node a fails, and the resources are flipped over to node b. Now, if for some reason the resources doesn't come online then there won't be any mails sent to us (i.e the mail part won't be triggered at all).

    I would really appreciate if you could point me to some scripts or at least provide a script to satisfy my if condition part. I'll still keep the question open ended, and would like others to come up with, as I said either a t-sql or powershell script to get this stuff working. This would not only help me but it would also help others who were trying to figure out how to set this up, and were hesitant to initiate the thread.

    Thanks...:)


  • Thursday, January 24, 2013 3:18 PM
    Answerer
     
     

    Hello,

    I would really appreciate if you could point me to some scripts or at least provide a script to satisfy my if condition part. I'll still keep the question open ended, and would like others to come up with, as I said either a t-sql or powershell script to get this stuff working. This would not only help me but it would also help others who were trying to figure out how to set this up, and were hesitant to initiate the thread.

    Unfortunately I don't know of any, and the logic part is fairly specific to your environment which will make it slightly harder to find any if they exist. The best course of action, if you can't find any on the internet, is to create your own and post it back to this thread for the others that you mention.

    -Sean


    Sean Gallardy | Blog | Twitter

  • Friday, January 25, 2013 3:26 PM
     
      Has Code

    You will have to come up with a solution for this need. I have tried to give you some starters on how you can acheive this.

    You need to create a table in one of the SQL Server instance which all the other instance can access. You need a simple table like below.

    create table NodeQuorum (Nodename varchar(100),Instance varchar(100))
    go

    Now what you can do is to manually insert all the current node names and instance names into this table as base data.

    Then create a startup stored procedure (sp_procoption Link)which basically inserts the the node it is running into a table. So every time SQL Server starts up it basically update

    Update NodeQuorum set Nodename = CONVERT(varchar(100),SERVERPROPERTY('ComputerNamePhysicalNetBIOS'))
    where Instance = CONVERT(varchar(100),@@SERVERNAME)

    This query gives you the current node where the instance is active. The table to be updated can be in any one chosen SQL Server, and the other instances might need a linked server created to insert to this instance.

    Now you can create a job which runs every 1 minute or 30 seconds and just checks how many instances runs on a particular node by querying the table and can send you an email. Upon receiving this email you can do the necessary actions. Now you can also create this job in a totally different monitoring server and also have the table created in a monitoring server if you want a better solution.


    Regards, Ashwin Menon My Blog - http:\\sqllearnings.com


    • Edited by Ashwin Menon Friday, January 25, 2013 3:26 PM
    •  
  • Friday, January 25, 2013 7:36 PM
     
     

    Hi Ashwin,

    Thanks. a ton for those starters. I would really appreciate if you could expand on the automatic stored procedure execution part, and what is the code that will actually go into a user defined stored proc which will be called upon by sp_procoption at start up after all the databases have been recovered. How can I implement the above "if" logic to select from a table & send me an e-mail...if and only if the above condition is true i.e node a/b currently have more than 5 resources.

    I would really appreciate your help on this.

    Thanks...)

  • Friday, January 25, 2013 8:00 PM
    Answerer
     
     

    Just be careful as this will require SQL Server to be running. In the even of a failover if the instance does not come up, this will fail to work.

    -Sean


    Sean Gallardy | Blog | Twitter

  • Monday, January 28, 2013 7:26 PM
     
     

    As Sean points out, how this would look is really dependent on your environment.

    That said, what it sounds like is when you deployed your WSFC for your FCIs, you didn't take the failover condition into account and do proper capacity planning. 10 instances is quite a bit on just two nodes. If you do proper capacity planning, you shouldn't see these kinds of issues. You always have to plan for worst case scenario.

    There are other ways you can control this which don't involve scripting.

    1. Only run additional Add Node operations on the nodes you want to fail that instance to. This will restrict where the instance can run. There is no rule that says if you have 5 nodes, everything needs to be able to run on all of them. This is obviously pointless on a 2 node cluster, but a big consideration for more than that.

    2. Use the preferred owners on the resource group in WSFC to set preferences on where you would want things to go in the event of a failover of the FCI. Again, something that generally just applies to more than 2 node WSFCs.

    3. Since failovers should only be an uncommon occurrence, you should be getting alerted via your monitoring tool already as it happens. That would allow you to manually adjust things if you need to.

    4. Hardware is cheap. Add a + 1 (or +n - more than one) node to the mix to aleviate pressure to be able to take advantage of things like my first or second suggestion.


    Allan Hirt Blog: http://www.sqlha.com/blog Coming in 2013: Mission Critical SQL Server 2012 - stay tuned


  • Tuesday, January 29, 2013 3:04 AM
     
     

    Hi there,

    This is very big conversation took a while to read it. My understanding is "streetfighter123" is having active/active cluster.

    " The problem here is we don't want to overburden the server with all the resources junked up on one node, but instead we want the nodes to be balanced for e.g: 5 instances on node a & 5 instances on node b. It doesn't matter who is the owner of these instances, it could be node a or node b, but at least the nodes should be balanced."

    your question is: Now, my concern is on setting up an alert on this to mail me, only when certain criteria is met (for e.g: if there are more than 5 instances on node b, I should get an alert to re balance the nodes, and vice versa). So, basically what I meant is that I need to setup an alert on SQL Cluster which will send me an alert if & only if I meet a certain condition wherein any of the node seems to be out of balance.

    Myanswer: on each of your node go to SQl agent>> new alert>> type = WMI event alert.

    you can use powershell script to monitor your events.

    here is the link to find out number of instancess on a server.

    http://blogs.msdn.com/b/askjay/archive/2011/10/11/how-can-i-get-a-list-of-installed-sql-server-instances.aspx

    Select ServerProperty('ComputerNamePhysicalNetBIOS') This tells you the active node. In your case you know it is active/active so you can use the node name for comparison.

    I know this is not what exactly you wanted. Hope this give you little bit of information

    cheers

    kumar

  • Tuesday, January 29, 2013 2:11 PM
     
     

    Thanks. for that advice Allan. I know it's too many instances to be there on a 2 node cluster, but certainly I can't change it anyway, and have to live with it until they migrate the instances. 

    As far as your 3 point goes, we don't really support investing in monitoring tools if there are things a Dba can automate, and yes we can't bear the licensing cost for each and every instance that's too much. I know h/w is cheap, but can't really invest in it without presenting some facts round the table.

    I would really love to have a generalized script for this, it doesn't necessarily be a t-sql or at best if someone could let me know how I could configure it using windows task scheduler using powershell or operating s/m commands for getting this stuff to work.

    Regards,

    Faisal

  • Tuesday, January 29, 2013 2:16 PM
     
     

    Thanks. Kumar.

    Yes, that won't solve my problem, but at least there are pointers coming in. I would request you guys to look into this, try if you could build some code and test in your environment. I am actually not that good in coding, and hence I am here, but I am trying my level best and reading a lot of books to get myself started.

    I really need a code...somebody pls. help. It's not a rocket science, but it looks really tough from where I am standing.

    Appreciate your help!!!

    Regards,

    Faisal

  • Sunday, February 03, 2013 9:33 AM
     
     
    If your company isn't investing in resources to help you, then it sounds like they are not serious about fixing the issue. Free doesn't solve all problems. Sounds like they have the wrong attitude. I work with tons of companies and yes, many are budget constrained, but they can't expect you to pull a rabbit out of a hat. Taking a long term vision is important. Investing the right resources now pays off in spades. We've given you all of the ammo. If they don't believe it, well, not much else we can do. We can't do your work for you.

    Allan Hirt Blog: http://www.sqlha.com/blog Coming in 2013: Mission Critical SQL Server 2012 - the followup to Pro SQL Server 2008 Failover Clustering

  • Monday, February 04, 2013 6:18 AM
     
     

    Hey Allan,

    I am not asking anyone out here to do my work, I know I have to do it myself. I thought forums are the right way to get in touch with people who've already done something of this sort, and most likely would provide help. Yes, of course my company is budget constrained, and hence I am here looking for help.You can't expect a 6 months or 1 yr. old experience guy to do a work similar to what a person having an 8 yr. experience holder can do.

    Thanks...anyways.

    Regards,

    Faisal

  • Monday, February 04, 2013 2:57 PM
    Answerer
     
     

    Faisal,

    This is the right place, you're correct. The problem was that no one has created (or if they have, they haven't responded) what you're looking for and thus the impetus is now on you to create what you're looking for. We've given you the links to the resources you'll need to do this - all we expect is that you follow up with them and attempt something.

    -Sean


    Sean Gallardy | Blog | Twitter

  • Monday, February 04, 2013 10:02 PM
     
     Answered

    To further Sean's comment and to re-address something I said: your company needs to invest. Be that in the right architecture so you don't run into  this situation as well as in you such as sending you to training on HA and SQL Server in general. It is unrealistic to have someone as you say with 6mo or a year's worth of experience to craft a solution. I wholly agree. YOu did say this (emphasis mine): "I would request you guys to look into this, try if you could build some code and test in your environment. I am actually not that good in coding, and hence I am here, but I am trying my level best and reading a lot of books to get myself started." Maybe it was a phrasing thing, but that seemed to me like an ask to do this for you. If I misinterpreted, I do apologize.

    Also, this is why people hire others to help them out (such as a consultant) - not to take over the job of the DBA, but to help get things set up properly and do approprate knowledge transfer. If you find the right person, it is money well spent. If your company is relying on you to get all the answers from a free forum on the Internet, that's a fundamental problem. You are doing the best with what you're given - we get that - but if that is the case as I said earlier, they do not seem committed to doing the right thing either by you or what their requirements supposedly are. Something has to give.

    Another reason that this is also hard is that everyone's requirements when it comes to what is acceptable and how things should look are completely different. The algorithm that works in your scenario will most likely NOT work elsewhere. You want a total custom solution based on your needs and requirements. I can say that with nearly 15 years of WSFC and FCI experience, no two customer deployments are 100% the same.

    We've given you plenty of options (some of which would involve spending money or changing the architecture) as well as suggestions on how to create the script. The ball is in your, or should I say, your company's proverbial court now. If you come up with something we're more than happy to have a look or make suggestions.


    Allan Hirt Blog: http://www.sqlha.com/blog Coming in 2013: Mission Critical SQL Server 2012 - the followup to Pro SQL Server 2008 Failover Clustering

  • Thursday, February 21, 2013 7:22 AM
     
      Has Code
    function sendmail{
    
         Write-Host "Sending Email"
    
         #SMTP server name
         $smtpServer = ""
    
         #Creating a Mail object
         $msg = new-object Net.Mail.MailMessage
    
         #Creating SMTP server object
         $smtp = new-object Net.Mail.SmtpClient($smtpServer)
    
         #Email structure
         $msg.From = ""
         $msg.To.Add("")
         $msg.subject = "To many instances on $env:COMPUTERNAME"
         $msg.body = ""
    
         #Sending email
         $smtp.Send($msg)
     
    }
    #Get the number of sqlservr.exe processes running and assign it to the param
    $param = Get-WmiObject win32_process | where-Object {$_.ProcessName -eq "sqlservr.exe"} | select name
    #If the number of sqlservr.exe process > 5 then execute the sendmail function
    IF ($param.Count -gt 5) {sendmail}

    Hi Guys,

    I tested this powershell script, and it's working amazingly well. The only problem that I am facing now, is that my boss is looking for something which needs to be more generic rather than a hard coded solution. For example, if we have 15 instances on a 2 node windows cluster, then we don't have to change the value in this line IF ($param.Count -gt 5) {sendmail} (like changing the value from 5 to 6 etc). Now, I need to take this through parameters, not manually stating the value. For e.g: this could be something like this 
    if (nSQLNode > round(nSQL/nNode)) {sendmail} 

    where nNode = total # of nodes in a windows cluster -- for e.g if it's a 2 node cluster, we are only interested in getting the count, not the node name. I know there is something cluster node to view the name of the nodes, but we just need the count
    nSQL = total # of SQL instances in the cluster --same applies here...we need count
    nSQLNode = # of Sql instances in any of the node.
    round() is a math function to round the decimal number.

    I know for sure, this can be implemented in powershell, but I don't know how I can get the count using cmdlets for each of these parameters. Can we take the parameter value using t-sql code, and incorporate it in powershell, and run it through windows task scheduler. Is that possible?

    I believe rest of the script will remain same.I don't have an idea whether we have anything called round() as a powershell function, but I do know there is something for e.g
    $param = [System.Math]::Round().

    Pls. provide your valuable inputs on how can programmatically get the counts.

    Regards,
    Faisal