none
Microsoft Windows Failover Clustering Manager Event ID 4681

    Question

  • System center picked up the following error on one of our SQL Clusters.

    ====================================================================================================

    Alert: Cluster network is down

    Source: Cluster Service

    Path: SQL1.fabrikam.com

    Last modified by: Auto-resolve

    Last modified time: 7/5/2012 4:00:01 AM

    Alert description: Cluster network 'Cluster Network 2' is down. None of the available nodes can communicate using this network. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

    Alert view link: "http://SCMOM:51908/default.aspx?DisplayMode=Pivot&AlertID=%7b867ae015-1ef1-4bdf-9ddc-0fc3020179d1%7d"

    Notification subscription ID generating this message: {1F554D30-41D0-6918-4D96-8EC71B208B09}

    ====================================================================================================

    I logged into the SQL server and didn't see any noticable errors. When I ran a query on the cluster for errors I got the following error:

    ====================================================================================================

    Failover Cluster Manager could not contact node 'sql2.Fabrikam.com'.

    System.ApplicationException: Failed to query the event log. ---> System.ComponentModel.Win32Exception: Access is denied
       --- End of inner exception stack trace ---

    Server stack trace:
       at MS.Internal.ServerClusters.EventLogQuery.EventQuery(EventSafeHandle session, String path, String query, UInt32 flags)
       at MS.Internal.ServerClusters.EventLogQuery..ctor(EventLogSession session, String channel, String text)
       at MS.Internal.ServerClusters.EventLogSession.CreateQuery(String channel, String text)
       at MS.Internal.ServerClusters.Management.EventLogQuerySet.<>c__DisplayClass5.<QueryWorker>b__1()
       at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Int32 methodPtr, Boolean fExecuteInContext, Object[]& outArgs)
       at System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(IMessage msg, IMessageSink replySink)

    Exception rethrown at [0]:
       at System.Runtime.Remoting.Proxies.RealProxy.EndInvokeHelper(Message reqMsg, Boolean bProxyCase)
       at System.Runtime.Remoting.Proxies.RemotingProxy.Invoke(Object NotUsed, MessageData& msgData)
       at MS.Internal.ServerClusters.Management.EventLogQuerySet.AsyncCallDelegate`1.EndInvoke(IAsyncResult result)
       at MS.Internal.ServerClusters.Management.EventLogQuerySet.ExecuteAsyncCall[T](AsyncCallDelegate`1 asyncCall)
       at MS.Internal.ServerClusters.Management.EventLogQuerySet.QueryWorker(Object a)

    ====================================================================================================

    Im running Server 2008 R2 with SQL 2008 R2 both updated and patched to the latest updates.

    Thursday, July 05, 2012 2:45 PM

All replies

  • Hello,

    The alert description gives the most information in this case:

    "Cluster network 'Cluster Network 2' is down. None of the available nodes can communicate using this network."

    This is networking at the windows leve, you'll have to get with your networking team and windows team to check the binding order of the adaptors and port configuration. It also wouldn't hurt to ask the network team if they noticed any gliches or rebooted any switches (or ran upgrades, etc) that could have caused an issue. Check all cables and make sure they are good.

    I would also go in and check what type of traffic 'Cluster Network 2' transports. I like to name mine what they are, such as "Public Cluster Network", "Private Cluster Network", "Management Network", etc. In cases like this it makes it much easier to understand what is affected. Depending on the version of Windows, the failover clustering administration GUI should show the networking and if it's up or down. It sounds like the private network lost connection, as if it were the public network you wouldn't have been able to remotely access SQL Server. Private traffic can run over the public network in event of a failure, but public can't traverse over the private network.

    Start from the time of the event and work with networking to see if anything happened. Check your network connections in the windows failover clustering tool and have networking check ports and cables. Figure out what cluster network 2 is and what type of traffic traverses over it. Find the root cause.

    -Sean


    Sean Gallardy, MCC | Blog

    Friday, July 06, 2012 4:41 PM
    Answerer