none
群集节点故障(急) RRS feed

  • 问题

  • 群集环境:

         Windows2003企业版+SQL2005群集配置,两个节点

    在企业中一直正常使用,但上周末突然发现“从机”的heart网卡IP地址被改变了,之后“从机”cluster服务不能启动,提示错误1067,“从机”磁盘管理界面内,看到共享磁盘标红,不可读取,在群集管理器中该节点处于脱机状态,试图启动时,提示“试图启动群集节点时发生错误,进程意外终止,错误ID 2147023829(8007042b)”。

     

    cluster.log

     

    0000844.00000690::2010/09/13-04:33:29.187 INFO [NM] destroying object for interface 723a8315-89b6-4762-b632-23ef6eed2220
    00000844.00000690::2010/09/13-04:33:29.187 INFO [OM] Deleting object heart - CCHQESAFENET01 (723a8315-89b6-4762-b632-23ef6eed2220)
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] deleting object for interface 01658333-50d1-4572-824e-52c371d84013
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] Deregistering network b26ef16d-38ca-47a7-9ddc-017652b0f9c4 (heart) from cluster transport.
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] destroying object for interface 01658333-50d1-4572-824e-52c371d84013
    00000844.00000690::2010/09/13-04:33:29.187 INFO [OM] Deleting object heart - CCHQESAFENET02 (01658333-50d1-4572-824e-52c371d84013)
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] Network interface cleanup complete
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] Network cleanup starting...
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] Deleting object for network 1bcfbdca-9514-40d1-b86b-fbd3e5c6beec.
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] Deregistering network 1bcfbdca-9514-40d1-b86b-fbd3e5c6beec (public) from cluster transport.
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] destroying object for network 1bcfbdca-9514-40d1-b86b-fbd3e5c6beec
    00000844.00000690::2010/09/13-04:33:29.187 INFO [OM] Deleting object public (1bcfbdca-9514-40d1-b86b-fbd3e5c6beec)
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] Deleting object for network b26ef16d-38ca-47a7-9ddc-017652b0f9c4.
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] Deregistering network b26ef16d-38ca-47a7-9ddc-017652b0f9c4 (heart) from cluster transport.
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] destroying object for network b26ef16d-38ca-47a7-9ddc-017652b0f9c4
    00000844.00000690::2010/09/13-04:33:29.187 INFO [OM] Deleting object heart (b26ef16d-38ca-47a7-9ddc-017652b0f9c4)
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] Network cleanup complete
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] Node cleanup starting...
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] Deleting object for node 2.
    00000844.00000690::2010/09/13-04:33:29.187 INFO [NM] Node cleanup complete
    00000844.00000690::2010/09/13-04:33:29.187 INFO [INIT] Deregistering RPC endpoints & interfaces.
    00000844.00000690::2010/09/13-04:33:29.187 WARN [INIT] Failed to join cluster, status 2
    00000844.00000690::2010/09/13-04:33:29.187 ERR  [CS] ClusterInitialize failed 2
    00000844.00000690::2010/09/13-04:33:29.187 WARN [INIT] The cluster service is shutting down.
    00000844.00000690::2010/09/13-04:33:29.187 INFO [EVT] EvShutdown
    00000844.00000690::2010/09/13-04:33:29.187 WARN [FM] Shutdown: Failover Manager requested to shutdown groups.
    00000844.00000690::2010/09/13-04:33:29.187 INFO [FM] FmpCleanupGroups: Entry
    00000844.00000690::2010/09/13-04:33:29.187 INFO [FM] FmpCleanupGroups: dwTimeOut=3600000 dwTimoutCount=180 waithint =20000
    00000844.0000119c::2010/09/13-04:33:29.187 INFO [FM] FmpCleanupGroupsWorker: Entry
    00000844.00000690::2010/09/13-04:33:29.187 INFO [FM] FmpCleanupGroups: Cleanup thread finished in time
    00000844.00000690::2010/09/13-04:33:29.187 INFO [FM] FmpCleanupGroups: Exit
    00000844.00000690::2010/09/13-04:33:29.187 INFO [Dm] DmShutdown
    00000844.00000690::2010/09/13-04:33:29.187 INFO [LM] LmRemoveTimerActivity: Entry 0x0000053c

    2010年9月13日 7:27

答案

  • 看起来好象是heartbeat网卡IP变了然后两个机器没法通讯了。没有了heartbeat cluster服务就无法启动了。网卡的IP改回去,然后重启cluster服务应该就行了。

     

    2010年9月15日 4:29

全部回复