none
Live Migration only Works one way Server 2008 R2 RRS feed

  • Question

  • We've been running into an issue recently with one of our server stacks.  Our two 2008 R2 servers are running in a cluster set up to live migrate VMs between eachother in case of hardware failure/etc.

    The servers are exactly the same hardware-wise, with one exception being that VIR001 has a SAS card for tape backups.  Live migration had been working fine up until a couple months ago when we noticed that VIR001 could not migrate to VIR002.  I've looked into this issue and I know that generally it is caused by improperly-named resources or permissions, but that doesn't seem to be the case here.

    VIR002 will live migrate any of its hosted VMs over to VIR001.  VIR001 will not LM any VMs over to VIR002.  Not sure where to start with this, I've noticed a couple Time-Server errors on VIR001, but if the issue was due to a sync problem, wouldn't both servers experience the same issue?  Beyond this, going through the cluster logs show a bunch of errors that I can't seem to find good google hits for.  I'll include it at the end of this post.

    I've ran the cluster validation tool, and found no issues other than a couple with Disk validation (live environment) and a some warnings about network adapters being on the same subnet.  Both are listed as possible owners in the cluster for each VM and there's enough RAM on both systems to allow for failover.  My issue seems to be similar to:

    http://social.technet.microsoft.com/Forums/en/winserverhyperv/thread/7dce3224-5319-4d11-856a-1b5f51e483a1

    http://social.technet.microsoft.com/Forums/en/winserverhyperv/thread/82a14a4c-5fe6-450c-8168-4cb32dc20a7d

    But in my case I know there's no architecture conflict as they're identical hardware.  Any advice on where the issue may lie would be great.  Thanks in advance,

    VIR001 Logs:

    00000b38.00000708::2013/01/25-19:31:16.401 INFO  [NM] Received request from client address fe80::341a:e4a3:9601:13ad.
    00000b38.000011bc::2013/01/25-19:31:16.504 INFO  [NM] Received request from client address fe80::341a:e4a3:9601:13ad.
    00000b38.00001690::2013/01/25-19:31:16.747 INFO  [NM] Received request from client address fe80::341a:e4a3:9601:13ad.
    00000f04.000015e4::2013/01/25-19:31:17.018 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
    00000b38.000011a0::2013/01/25-19:31:17.018 WARN  [RCM] Failed to load restype 'MSMQ': error 21.
    00000b38.000011a0::2013/01/25-19:31:17.018 WARN  [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQ's DLL is not present on this node.  Attempting to find a good node...
    00000f04.000015e4::2013/01/25-19:31:17.021 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
    00000b38.000005fc::2013/01/25-19:31:17.021 WARN  [RCM] Failed to load restype 'MSMQ': error 21.
    00000b38.00001690::2013/01/25-19:31:17.034 INFO  [API] s_ApiGetQuorumResource final status 0.
    00000f04.000015e4::2013/01/25-19:31:17.049 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
    00000b38.00001690::2013/01/25-19:31:17.049 WARN  [RCM] Failed to load restype 'MSMQTriggers': error 21.
    00000b38.00001690::2013/01/25-19:31:17.049 WARN  [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQTriggers's DLL is not present on this node.  Attempting to find a good node...
    00000f04.000015e4::2013/01/25-19:31:17.052 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
    00000b38.000011a0::2013/01/25-19:31:17.052 WARN  [RCM] Failed to load restype 'MSMQTriggers': error 21.
    00000f04.00001688::2013/01/25-19:31:17.262 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
    00000b38.000011a0::2013/01/25-19:31:17.262 WARN  [RCM] Failed to load restype 'MSMQ': error 21.
    00000b38.000011a0::2013/01/25-19:31:17.262 WARN  [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQ's DLL is not present on this node.  Attempting to find a good node...
    00000f04.00001688::2013/01/25-19:31:17.265 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
    00000b38.00001690::2013/01/25-19:31:17.265 WARN  [RCM] Failed to load restype 'MSMQ': error 21.
    00000f04.00001688::2013/01/25-19:31:17.270 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
    00000b38.000011a0::2013/01/25-19:31:17.270 WARN  [RCM] Failed to load restype 'MSMQTriggers': error 21.
    00000b38.000011a0::2013/01/25-19:31:17.270 WARN  [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQTriggers's DLL is not present on this node.  Attempting to find a good node...
    00000f04.00001688::2013/01/25-19:31:17.272 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
    00000b38.00001690::2013/01/25-19:31:17.272 WARN  [RCM] Failed to load restype 'MSMQTriggers': error 21.
    00000f04.000015e4::2013/01/25-19:32:19.349 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
    00000b38.000011a0::2013/01/25-19:32:19.349 WARN  [RCM] Failed to load restype 'MSMQ': error 21.
    00000b38.000011a0::2013/01/25-19:32:19.349 WARN  [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQ's DLL is not present on this node.  Attempting to find a good node...
    00000f04.000015e4::2013/01/25-19:32:19.352 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'
    00000b38.00001690::2013/01/25-19:32:19.352 WARN  [RCM] Failed to load restype 'MSMQ': error 21.
    00000f04.000015e4::2013/01/25-19:32:19.368 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
    00000b38.000011a0::2013/01/25-19:32:19.368 WARN  [RCM] Failed to load restype 'MSMQTriggers': error 21.
    00000b38.000011a0::2013/01/25-19:32:19.368 WARN  [RCM] rcm::RcmApi::ResTypeControl: ResType MSMQTriggers's DLL is not present on this node.  Attempting to find a good node...
    00000f04.000015e4::2013/01/25-19:32:19.370 ERR   [RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'
    00000b38.00001690::2013/01/25-19:32:19.371 WARN  [RCM] Failed to load restype 'MSMQTriggers': error 21.
    00000b38.000005fc::2013/01/25-19:32:25.848 INFO  [RCM] rcm::RcmApi::OnlineGroup: (SQL002)
    00000b38.00000cb0::2013/01/25-19:32:25.850 INFO  [GUM] Node 1: Processing RequestLock 2:4441
    00000b38.00000cb0::2013/01/25-19:32:25.850 INFO  [GUM] Node 1: Processing GrantLock to 2 (sent by 1 gumid: 81317)
    00000b38.00000adc::2013/01/25-19:32:25.947 INFO  [RCM] rcm::RcmGum::SetGroupPersistentState(SQL002,1)
    00000b38.000011a0::2013/01/25-19:32:32.425 INFO  [RCM] rcm::RcmApi::OnlineGroup: (SQL001)
    00000b38.00000adc::2013/01/25-19:32:32.439 INFO  [RCM] rcm::RcmGum::SetGroupPersistentState(SQL001,1)
    00000b38.00001690::2013/01/25-19:32:32.444 INFO  [NM] Received request from client address fe80::341a:e4a3:9601:13ad.
    00000b38.00000384::2013/01/25-19:32:32.451 INFO  [NM] Received request from client address fe80::341a:e4a3:9601:13ad.
    00000f04.000015e4::2013/01/25-19:33:31.544 INFO  [RES] Virtual Machine <Virtual Machine SQL002>: Current state 'Offline', event 'AcceptLM'
    00000b38.000005fc::2013/01/25-19:33:31.548 INFO  [GUM] Node 1: Processing RequestLock 1:28
    00000b38.00000cb0::2013/01/25-19:33:31.549 INFO  [GUM] Node 1: Processing GrantLock to 1 (sent by 2 gumid: 81334)
    00000f04.00000b48::2013/01/25-19:33:33.372 INFO  [RES] Virtual Machine <Virtual Machine SQL002>: State change 'Offline' -> 'MigrationTargetSetup'
    00000f04.000015e4::2013/01/25-19:33:33.372 ERR   [RHS] Error 997 from ResourceControl for resource Virtual Machine SQL002.
    00000b38.00001690::2013/01/25-19:33:33.373 WARN  [RCM] ResourceControl(UNKNOWN (0x160000c) obj: .MU. flags:2097164 code:0 access:<<insert {4} refers an argument that is not supplied. Only 4 argument(s) provided>>) to Virtual Machine SQL002 returned 997.
    00000b38.00000cb0::2013/01/25-19:33:33.374 INFO  [GUM] Node 1: Processing RequestLock 2:4458
    00000b38.00000cb0::2013/01/25-19:33:33.374 INFO  [GUM] Node 1: Processing GrantLock to 2 (sent by 1 gumid: 81336)
    00000b38.00000adc::2013/01/25-19:33:33.456 INFO  [RCM] rcm::RcmGum::SetGroupPreferredOwners(SQL002,<vector len='2'>
    00000b38.00000adc::2013/01/25-19:33:33.456 INFO      <item>1</item>
    00000b38.00000adc::2013/01/25-19:33:33.456 INFO      <item>2</item>
    00000b38.00000adc::2013/01/25-19:33:33.456 INFO  </vector>
    00000b38.00000adc::2013/01/25-19:33:33.456 INFO  )
    00000b38.00000708::2013/01/25-19:33:33.605 INFO  [GUM] Node 1: Processing RequestLock 1:30
    00000b38.00000cb0::2013/01/25-19:33:33.608 INFO  [GUM] Node 1: Processing GrantLock to 1 (sent by 2 gumid: 81355)
    00000f04.00000fa4::2013/01/25-19:33:33.656 INFO  [RES] Virtual Machine <Virtual Machine SQL002>: Current state 'MigrationTargetSetup', event 'Started'
    00000f04.00000fa4::2013/01/25-19:33:33.656 INFO  [RES] Virtual Machine <Virtual Machine SQL002>: State change 'MigrationTargetSetup' -> 'MigrationTargetStart'
    00000304.00000368::2013/01/25-19:33:35.169 INFO  [CAM] CamApCallPackage
    00000304.00000368::2013/01/25-19:33:35.169 INFO  [CAM] CallInfo: Proc 4 Thread 4824 Count 0 Att 8704
    00000304.00000368::2013/01/25-19:33:35.169 INFO  [CAM] ClientInfo: Logon 999 Proc 4 Thread 4824 TCB 1 Impersonating 1 Restrict 0 Flags 0
    00000304.00000368::2013/01/25-19:33:35.169 INFO  [CAM] GetCNO forceNew=1
    00000304.00000368::2013/01/25-19:33:35.170 INFO  [CAM] LsaLogon: 0
    00000304.00000368::2013/01/25-19:33:35.170 INFO  [CAM] Creating new token: 0
    00000304.00000368::2013/01/25-19:33:35.170 INFO  [CAM] GetCNOToken: LUID 0:5896ef, token: 228e768, DuplicateHandle: 0
    00000b38.00000cb0::2013/01/25-19:33:35.196 INFO  [GUM] Node 1: Processing RequestLock 2:4477
    00000b38.00000cb0::2013/01/25-19:33:35.196 INFO  [GUM] Node 1: Processing GrantLock to 2 (sent by 1 gumid: 81360)
    00000b38.00000adc::2013/01/25-19:34:24.099 INFO  [RCM] rcm::RcmGum::GroupMoveOperation(POSBFJSQL002,1)
    00000b38.000005fc::2013/01/25-19:34:24.101 INFO  [RCM] rcm::RcmResource::Online: bringing Virtual Machine SQL002's provider resource 'Virtual Machine Configuration SQL002' online.
    00000b38.000005fc::2013/01/25-19:34:24.101 INFO  [RCM] TransitionToState(Virtual Machine Configuration SQL002) Offline-->OnlineCallIssued.
    00000b38.000005fc::2013/01/25-19:34:24.101 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (SQL002, BringingToPersistentState --> Pending)
    00000b38.000005fc::2013/01/25-19:34:24.101 INFO  [RCM] TransitionToState(Virtual Machine SQL002) Offline-->WaitingToComeOnline.
    00000f04.000015e4::2013/01/25-19:34:24.101 INFO  [RES] Virtual Machine Configuration <Virtual Machine Configuration SQL002>: Current state 'Offline', event 'Online'
    00000f04.000015e4::2013/01/25-19:34:24.101 INFO  [RES] Virtual Machine Configuration <Virtual Machine Configuration SQL002>: State change 'Offline' -> 'OnlinePending'
    00000b38.000005fc::2013/01/25-19:34:24.102 INFO  [RCM] HandleMonitorReply: ONLINERESOURCE for 'Virtual Machine Configuration SQL002', gen(0) result 997.
    00000b38.000005fc::2013/01/25-19:34:24.102 INFO  [RCM] TransitionToState(Virtual Machine Configuration SQL002) OnlineCallIssued-->OnlinePending.
    00000b38.00000384::2013/01/25-19:34:24.132 INFO  [GUM] Node 1: Processing RequestLock 1:35
    00000b38.00000cb0::2013/01/25-19:34:24.133 INFO  [GUM] Node 1: Processing GrantLock to 1 (sent by 2 gumid: 81402)
    00000b38.000005fc::2013/01/25-19:34:24.154 INFO  [NM] Received request from client address fe80::341a:e4a3:9601:13ad.
    00000f04.00000fc8::2013/01/25-19:34:24.202 INFO  [RES] Virtual Machine Configuration <Virtual Machine Configuration SQL002>: Current state 'OnlinePending', event 'ConfigRegistered', result 0x00000000
    00000f04.00000fc8::2013/01/25-19:34:24.202 INFO  [RES] Virtual Machine Configuration <Virtual Machine Configuration SQL002>: State change 'OnlinePending' -> 'Online'
    00000f04.000015d8::2013/01/25-19:34:24.273 INFO  [RHS] Resource Virtual Machine Configuration SQL002 has come online. RHS is about to report status change to RCM
    00000b38.00000708::2013/01/25-19:34:24.273 INFO  [RCM] HandleMonitorReply: ONLINERESOURCE for 'Virtual Machine Configuration SQL002', gen(0) result 0.
    00000b38.00000708::2013/01/25-19:34:24.273 INFO  [RCM] TransitionToState(Virtual Machine Configuration SQL002) OnlinePending-->Online.
    00000b38.00000384::2013/01/25-19:34:24.273 INFO  [RCM] TransitionToState(Virtual Machine SQL002) WaitingToComeOnline-->OnlineCallIssued.
    00000f04.000015e4::2013/01/25-19:34:24.274 INFO  [RES] Virtual Machine <Virtual Machine SQL002>: Current state 'MigrationTargetStart', event 'Online'
    00000f04.000015e4::2013/01/25-19:34:24.274 INFO  [RES] Virtual Machine <Virtual Machine SQL002>: State change 'MigrationTargetStart' -> 'OnlinePending'
    00000b38.000011a0::2013/01/25-19:34:24.274 INFO  [RCM] HandleMonitorReply: ONLINERESOURCE for 'Virtual Machine SQL002', gen(0) result 997.
    00000b38.000011a0::2013/01/25-19:34:24.274 INFO  [RCM] TransitionToState(Virtual Machine SQL002) OnlineCallIssued-->OnlinePending.
    00000b38.00000cb0::2013/01/25-19:34:24.612 INFO  [GUM] Node 1: Processing RequestLock 2:4519
    00000b38.00000cb0::2013/01/25-19:34:24.612 INFO  [GUM] Node 1: Processing GrantLock to 2 (sent by 1 gumid: 81408)
    00000b38.000011a0::2013/01/25-19:34:24.620 INFO  [GUM] Node 1: Processing RequestLock 1:41
    00000b38.00000cb0::2013/01/25-19:34:24.622 INFO  [GUM] Node 1: Processing GrantLock to 1 (sent by 2 gumid: 81409)
    00000b38.00000cb0::2013/01/25-19:34:24.628 INFO  [GUM] Node 1: Processing RequestLock 2:4520
    00000b38.000011a0::2013/01/25-19:34:24.631 INFO  [GUM] Node 1: Processing GrantLock to 2 (sent by 1 gumid: 81410)
    00000f04.00000fa4::2013/01/25-19:34:24.632 INFO  [RES] Virtual Machine <Virtual Machine SQL002>: Current state 'OnlinePending', event 'StartupResult', result 0x00000000
    00000f04.00000fa4::2013/01/25-19:34:24.632 INFO  [RES] Virtual Machine <Virtual Machine SQL002>: State change 'OnlinePending' -> 'Online'
    00000b38.00000384::2013/01/25-19:34:24.652 INFO  [GUM] Node 1: Processing RequestLock 1:42
    00000b38.00000cb0::2013/01/25-19:34:24.653 INFO  [GUM] Node 1: Processing GrantLock to 1 (sent by 2 gumid: 81414)
    00000b38.00000cb0::2013/01/25-19:34:24.665 INFO  [GUM] Node 1: Processing RequestLock 2:4524
    00000b38.00000cb0::2013/01/25-19:34:24.665 INFO  [GUM] Node 1: Processing GrantLock to 2 (sent by 1 gumid: 81416)
    00000b38.00000adc::2013/01/25-19:34:24.691 ERR   mscs::GumAgent::ExecuteQueuedUpdate: ERROR_FILE_NOT_FOUND(2)' because of '::RegDeleteKey( handle, name.c_str() )'(Resources\29a03342-efab-4bba-8564-149adfc6de35\Internal\VmPrivateRestartAction)
    00000b38.00000adc::2013/01/25-19:34:24.701 ERR   mscs::GumAgent::ExecuteQueuedUpdate: ERROR_FILE_NOT_FOUND(2)' because of '::RegDeleteKey( handle, name.c_str() )'(Resources\29a03342-efab-4bba-8564-149adfc6de35\Internal\VmPrivateRestartThreshold)
    00000b38.00000708::2013/01/25-19:34:24.705 INFO  [GUM] Node 1: Processing RequestLock 1:44
    00000b38.00000cb0::2013/01/25-19:34:24.706 INFO  [GUM] Node 1: Processing GrantLock to 1 (sent by 2 gumid: 81416)
    00000b38.00000cb0::2013/01/25-19:34:24.732 INFO  [GUM] Node 1: Processing RequestLock 2:4526
    00000b38.00000cb0::2013/01/25-19:34:24.732 INFO  [GUM] Node 1: Processing GrantLock to 2 (sent by 1 gumid: 81418)
    00000b38.00000adc::2013/01/25-19:34:24.743 ERR   mscs::GumAgent::ExecuteQueuedUpdate: ERROR_FILE_NOT_FOUND(2)' because of '::RegDeleteKey( handle, name.c_str() )'(Resources\29a03342-efab-4bba-8564-149adfc6de35\Internal\ConfigPrivateRestartAction)
    00000b38.00000adc::2013/01/25-19:34:24.752 ERR   mscs::GumAgent::ExecuteQueuedUpdate: ERROR_FILE_NOT_FOUND(2)' because of '::RegDeleteKey( handle, name.c_str() )'(Resources\29a03342-efab-4bba-8564-149adfc6de35\Internal\ConfigPrivateRestartThreshold)
    00000b38.00000708::2013/01/25-19:34:24.756 INFO  [GUM] Node 1: Processing RequestLock 1:46
    00000b38.00000cb0::2013/01/25-19:34:24.757 INFO  [GUM] Node 1: Processing GrantLock to 1 (sent by 2 gumid: 81418)
    00000b38.00001690::2013/01/25-19:34:24.777 INFO  [RCM] rcm::RcmGum::SetGroupPreferredOwners(SQL002,<vector len='0'>
    00000b38.00001690::2013/01/25-19:34:24.777 INFO  </vector>
    00000b38.00001690::2013/01/25-19:34:24.777 INFO  )
    00000b38.00001690::2013/01/25-19:34:24.794 ERR   [GUM] Node 1: Local Execution of a gum request /rcm/gum/SetGroupPreferredOwners resulted in exception ERROR_CLUSTER_GROUP_MOVING(5908)' because of 'Group is currently moving.'
    00000b38.00001690::2013/01/25-19:34:24.798 WARN  [RCM] rcm::RcmApi::SetGroupPreferredOwners: forwarded, no retry on error 5908
    00000b38.00001690::2013/01/25-19:34:25.848 INFO  [RCM] rcm::RcmGum::SetGroupPreferredOwners(SQL002,<vector len='0'>
    00000b38.00001690::2013/01/25-19:34:25.848 INFO  </vector>
    00000b38.00001690::2013/01/25-19:34:25.848 INFO  )
    00000b38.00001690::2013/01/25-19:34:25.848 ERR   [GUM] Node 1: Local Execution of a gum request /rcm/gum/SetGroupPreferredOwners resulted in exception ERROR_CLUSTER_GROUP_MOVING(5908)' because of 'Group is currently moving.'
    00000b38.00001690::2013/01/25-19:34:25.848 WARN  [RCM] rcm::RcmApi::SetGroupPreferredOwners: forwarded, no retry on error 5908
    00000f04.00000b48::2013/01/25-19:34:26.145 INFO  [RES] Virtual Machine <Virtual Machine SQL002>: 'Virtual Machine SQL002' successfully started the virtual machine.
    00000f04.00000b48::2013/01/25-19:34:26.145 INFO  [RHS] Resource Virtual Machine SQL002 has come online. RHS is about to report status change to RCM
    00000f04.000015d8::2013/01/25-19:34:26.145 INFO  [RES] Virtual Machine Configuration <Virtual Machine Configuration SQL002>: 'Virtual Machine Configuration SQL002' successfully registered the configuration for the virtual machine.
    00000b38.00001690::2013/01/25-19:34:26.146 INFO  [RCM] HandleMonitorReply: ONLINERESOURCE for 'Virtual Machine SQL002', gen(0) result 0.
    00000b38.00001690::2013/01/25-19:34:26.146 INFO  [RCM] TransitionToState(Virtual Machine SQL002) OnlinePending-->Online.
    00000b38.00001690::2013/01/25-19:34:26.146 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (SQL002, Pending --> Online)
    00000b38.00001690::2013/01/25-19:34:26.146 INFO  [RCM] Group move for 'SQL002' has completed.
    00000b38.00001780::2013/01/25-19:34:26.903 INFO  [RCM] rcm::RcmGum::SetGroupPreferredOwners(SQL002,<vector len='0'>
    00000b38.00001780::2013/01/25-19:34:26.903 INFO  </vector>
    00000b38.00001780::2013/01/25-19:34:26.903 INFO  )
    00000b38.00000cb0::2013/01/25-19:34:26.919 INFO  [GUM] Node 1: Processing RequestLock 2:4528
    00000b38.00000cb0::2013/01/25-19:34:26.920 INFO  [GUM] Node 1: Processing GrantLock to 2 (sent by 1 gumid: 81421)
    00000b38.00000adc::2013/01/25-19:34:26.939 ERR   mscs::GumAgent::ExecuteQueuedUpdate: ERROR_FILE_NOT_FOUND(2)' because of '::RegDeleteKey( handle, name.c_str() )'(Resources\29a03342-efab-4bba-8564-149adfc6de35\Internal\FailoverThreshold)
    00000b38.00000adc::2013/01/25-19:34:26.976 ERR   mscs::GumAgent::ExecuteQueuedUpdate: ERROR_FILE_NOT_FOUND(2)' because of '::RegDeleteKey( handle, name.c_str() )'(Resources\29a03342-efab-4bba-8564-149adfc6de35\Internal\Priority)
    00000b38.00000adc::2013/01/25-19:34:26.985 ERR   mscs::GumAgent::ExecuteQueuedUpdate: ERROR_FILE_NOT_FOUND(2)' because of '::RegDeleteKey( handle, name.c_str() )'(Resources\29a03342-efab-4bba-8564-149adfc6de35\Internal\PreferredNodes)
    00000f04.00000a80::2013/01/25-19:34:32.518 INFO  [RES] Virtual Machine <Virtual Machine SQL002>: Current state 'Online', event 'StartLM'
    00000b38.00000384::2013/01/25-19:34:32.518 INFO  [GUM] Node 1: Processing RequestLock 1:51
    00000b38.00000cb0::2013/01/25-19:34:32.520 INFO  [GUM] Node 1: Processing GrantLock to 1 (sent by 2 gumid: 81421)
    00000f04.00001720::2013/01/25-19:34:32.619 INFO  [RES] Virtual Machine <Virtual Machine SQL002>: State change 'Online' -> 'MigrationSourceSetup'
    00000b38.00000708::2013/01/25-19:34:35.252 WARN  [RCM] ResourceControl(UNKNOWN (0x160000c) obj: .MU. flags:2097164 code:0 access:<<insert {4} refers an argument that is not supplied. Only 4 argument(s) provided>>) to Virtual Machine SQL002 returned 2147500037.
    00000f04.00001720::2013/01/25-19:34:35.263 ERR   [RES] Virtual Machine <Virtual Machine SQL002>: 'Virtual Machine SQL002' live migration did not succeed at the source.
    00000f04.00001720::2013/01/25-19:34:35.280 INFO  [RES] Virtual Machine <Virtual Machine SQL002>: State change 'MigrationSourceSetup' -> 'MigrationSourceCanceled'
    00000f04.00000fa4::2013/01/25-19:34:35.436 INFO  [RES] Virtual Machine <Virtual Machine SQL002>: Current state 'MigrationSourceCanceled', event 'MigrationSourceOfflineResult', result 0x80004005
    00000f04.00000fa4::2013/01/25-19:34:35.436 INFO  [RES] Virtual Machine <Virtual Machine SQL002>: State change 'MigrationSourceCanceled' -> 'Online'
    00000b38.00000ae8::2013/01/25-19:45:48.883 INFO  [NM] Received request from client address VIR001.


    Alerts from Social.microsoft.com Forums

    Friday, February 1, 2013 7:54 PM

Answers

  • Hi,

    Sorry for the late reply.  Yes I was able to figure it out after some time.  I made an update to my question on Server Fault but forgot to update this one:

    Well, finally found the issue:

    I noticed that some of the created cluster networks were not legitimate (ie, they only contained one NIC, or were teamed with a NIC on a different subnet). I had disabled these. I was told by my colleagues that binding on the physical servers could make a difference. I changed these. I verified the cluster, made sure all nodes had both servers listed as possible owners, and to top it off, I had found the "Network for Live Migration" tab under properties for the Virtual Machine Resource.

    I had ordered the cluster networks in "Network for Live Migration" in such a way that the Live Migration cluster network was first, followed by all active networks, with the disabled networks at the bottom. No love. Today after changing the binding and seeing no change, I decided to disable the all cluster networks in the Live Migration tab beyond three internal networks (LM, host, Cluster Domain). Now it's working.

    Not sure what caused this to begin with. We haven't made any physical changes to the hardware in the last year. This was working at least 4 months ago. Looks like the Cluster manager doesn't always listen to its own settings.

    Thanks for the replies on this question.


    Alerts from Social.microsoft.com Forums

    • Marked as answer by IonDancer Friday, October 24, 2014 4:55 PM
    Friday, October 24, 2014 4:50 PM

All replies

  • Did you find a solution for this?
    Tuesday, September 9, 2014 9:54 AM
  • Hi,

    Sorry for the late reply.  Yes I was able to figure it out after some time.  I made an update to my question on Server Fault but forgot to update this one:

    Well, finally found the issue:

    I noticed that some of the created cluster networks were not legitimate (ie, they only contained one NIC, or were teamed with a NIC on a different subnet). I had disabled these. I was told by my colleagues that binding on the physical servers could make a difference. I changed these. I verified the cluster, made sure all nodes had both servers listed as possible owners, and to top it off, I had found the "Network for Live Migration" tab under properties for the Virtual Machine Resource.

    I had ordered the cluster networks in "Network for Live Migration" in such a way that the Live Migration cluster network was first, followed by all active networks, with the disabled networks at the bottom. No love. Today after changing the binding and seeing no change, I decided to disable the all cluster networks in the Live Migration tab beyond three internal networks (LM, host, Cluster Domain). Now it's working.

    Not sure what caused this to begin with. We haven't made any physical changes to the hardware in the last year. This was working at least 4 months ago. Looks like the Cluster manager doesn't always listen to its own settings.

    Thanks for the replies on this question.


    Alerts from Social.microsoft.com Forums

    • Marked as answer by IonDancer Friday, October 24, 2014 4:55 PM
    Friday, October 24, 2014 4:50 PM
  • So in case anyone can't understand what I was trying to say in my earlier resolution:

    Go to "Failover Cluster Manager" and select the failover cluster that you're working on.  Go to the "Networks" tab and you will see all of the automatic NIC matching that Windows has done.  Go through these and in each one's properties tab, disable Cluster communication or "Allow Clients to connect through this network" for each of the subnets that they shouldn't have access to.  Part of the problem is that the automated system can be dense at times, doing things like setting up NICs that have no access (sometimes even pairing ones on differing subdomains) to each other as a group.  Disabling these will cause bad pairs to no longer be used for things like live migration and heartbeat monitoring.  This didn't resolve my issue, however it may be useful to you.

    Then I finally found out something:  There are two Virtual Machine properties lists in the cluster manager.  From Microsoft:

    "

    1. In the console tree (on the left), select the clustered virtual machine for which you want to configure the network for live migration.

    2. Right-click the virtual machine resource displayed in the center pane (not on the left), and then click Properties.

    "

    Once in here, you can now choose specifically which Cluster networks should be used (and in what order, although this doesn't seem to matter) for your live migrations.  It's possible that a network pair that actually cannot communicate may currently be the de-facto choice for your live migrations and causing this to fail.  Picture of Screen:

    http://blogs.technet.com/blogfiles/askcore/WindowsLiveWriter/WindowsServer2008R2LiveMigrationThedevi_67E2/clip_image022_2.jpg

    This option will reflect every single virtual machine in the cluster, it's a global option.  One of the most terrible layouts in 2008 R2 (and maybe newer), took me like a month to find since few sites actually mention this.

    Followup blog that may help:

    http://blogs.technet.com/b/askcore/archive/2009/12/10/windows-server-2008-r2-live-migration-the-devil-may-be-in-the-networking-details.aspx

    and Technet with the information on how to actually access the properties:

    http://technet.microsoft.com/en-us/library/dd446679(v=ws.10).aspx


    Alerts from Social.microsoft.com Forums

    Friday, October 24, 2014 6:15 PM