none
Windows Server 2008 Failover Cluster with DB2 and SAP groups RRS feed

  • Question

  • In a Windows Server 2008 Failover Cluster that is composed by:

    • 1 cluster instance with 2 nodes with Windows Server 2008 SP2 in Node and Disk majority.
    • 1 SAP Netveawer Group (SAP PCO).
    • 1 DB2 Group (DB2 PCO).

    Each node is composed by:

    • Virtual machine in a VMWare VSphere 4.1 (each virtual machine in differents physical hosts)
    • Windows Server 2008 SP2.
    • 1 ethernet card with for public interface.
    • 1 ethernet card with for heatbeat.

    The cluster instance is composed by:

    • 1 virtual IP.
    • 1 Quorum Disk.

    The SAP PCO Group is composed by:

    • 1 virtual IP.
    • 1 Hard Disk for SAP Bin.
    • 1 Hard Disk for SAP Logs.
    • Running in node1.

    The DB2PCO Group is composed by:

    • 1 virtual IP.
    • 1 Hard Disk for DB2 Bin.
    • 1 Hard Disk for DB2 Logs.
    • Running in node2.

    We are experimenting problems in failover cluster, detecting the next secuence of events in the Windows Event Viewer:

    1. Information Event ID 4201, Source TCPIP in Node 1: The system detected that network adapter Local Area Connection* 8 was connected to the network, and has initiated normal operation. Where network adapter Local Area Connection* 8 is the Microsoft Failover Cluster Virtual Adapter.
    2. Critical Event ID 1135, Source FailoverClustering in Node 1:Cluster node 'node2' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
    3. Critical Event ID 1135, Source FailoverClustering in Node 2: Cluster node 'coleccp1' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
    4. Information Event ID 4201, Source TCPIP in Node 2: The system detected that network adapter Local Area Connection* 8 was connected to the network, and has initiated normal operation. Where network adapter Local Area Connection* 8 isthe Microsoft Failover Cluster Virtual Adapter.
    5. Critical Event ID 4199, Source TCPIP in Node 1: The system detected an address conflict for IP address 192.168.178.5 with the system having network hardware address 00-50-56-B2-6D-F8. Network operations on this system may be disrupted as a result.
    6. Critical Event ID 1069, Source FailoverClustering in Node 1: Cluster resource 'Disk F:DB2LOG' in clustered service or application 'DB2 PCO Group' failed.
    7. Critical Event ID 1069, Source FailoverClustering in Node 1: Cluster resource 'Disk E:DB2DB' in clustered service or application 'DB2 PCO Group' failed.
    8. Critical Event ID 1049, Source FailoverClustering in Node 1: Cluster IP address resource 'DB2 IP PCO' cannot be brought online because a duplicate IP address '192.168.178.5' was detected on the network.  Please ensure all IP addresses are unique.
    9. Critical Event ID 1069, Source FailoverClustering in Node 1: Cluster resource 'DB2 IP PCO' in clustered service or application 'DB2 PCO Group' failed.
    10. Critical Event ID 1069, Source FailoverClustering in Node 2: Cluster resource 'Disk Q:Quorum' in clustered service or application 'Cluster Group' failed.
    11. Critical Event ID 1069, Source FailoverClustering in Node 1: Cluster resource 'Disk F:DB2LOG' in clustered service or application 'DB2 PCO Group' failed.
    12. Critical Event ID 1069, Source FailoverClustering in Node 1: Cluster resource 'Disk E:DB2DB' in clustered service or application 'DB2 PCO Group' failed.
    13. Critical Event ID 1069, Source FailoverClustering in Node 1: Cluster resource 'DB2 IP PCO' in clustered service or application 'DB2 PCO Group' failed.
    14. Critical Event ID 1049, Source FailoverClustering in Node 1: Cluster IP address resource 'DB2 IP PCO' cannot be brought online because a duplicate IP address '192.168.178.5' was detected on the network.  Please ensure all IP addresses are unique.
    15. Critical Event ID 1205, Source FailoverClustering in Node 1: The Cluster service failed to bring clustered service or application 'DB2 PCO Group' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
    16. Critical Event ID 1069, Source FailoverClustering in Node 2: Cluster resource 'Disk Q:Quorum' in clustered service or application 'Cluster Group' failed.
    17. Information Event ID 7036, Source Service Control Manager Eventlog Provider in Node 2: The Cluster Service service entered the stopped state.
    18. Critical Event ID 7024, Source Service Control Manager Eventlog Provider in Node 2: The Cluster Service service terminated with service-specific error 5925 (0x1725).
    19. Critical Event ID 7031, Source Service Control Manager Eventlog Provider in Node 2: The Cluster Service service terminated unexpectedly.  It has done this 1 time(s).  The following corrective action will be taken in 60000 milliseconds: Restart the service.
    20. Critical Event ID 1177, Source FailoverClustering in Node 2: The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.
    21. Critical Event ID 7034, Source Service Control Manager Eventlog Provider in Node 2: The DB2 - SAPDB2PCO - DB2PCO-0 service terminated unexpectedly.  It has done this 2 time(s).
    22. Information Event ID 7036, Source Service Control Manager Eventlog Provider in Node 2: The Cluster Service service entered the running state.
    23. Information Event ID 4201, Source TCPIP in Node 2: The system detected that network adapter Local Area Connection* 8 was connected to the network, and has initiated normal operation. Where network adapter Local Area Connection* 8 isthe Microsoft Failover Cluster Virtual Adapter.
    24. Information Event ID 4201, Source TCPIP in Node 1: The system detected that network adapter Local Area Connection* 8 was connected to the network, and has initiated normal operation. Where network adapter Local Area Connection* 8 isthe Microsoft Failover Cluster Virtual Adapter.

    And the state of the cluster groups are:

    • SAP PCO running in node1.
    • DB2 PCO stoped.

    To start the DB2 PCO group only is necessary to move the instance to node2.

    We like to know if there is any hotfix that can prevent this cases or if anybody knows how to solve it.

    Friday, August 3, 2012 11:54 AM