none
Hyper-V guest SQL 2012 cluster live migration failure

    Question

  • I have two IBM HX5 nodes connected to IBM DS5300. Hyper-V 2012 cluster was built on blades. In HV cluster was made six virtual machines, connected to DS5300 via HV Virtual SAN. These VMs was formed a guest SQL Cluster. Databases' files are placed on DS5300 storage and available through VM FibreChannel Adapters. IBM MPIO Module is installed on all hosts and VMs.

    SQL Server instances work without problem. But! When I try to live migrate SQL VM to another HV node an SQL Instance fails. In SQL error log I see:

    2013-06-19 10:39:44.07 spid1s      Error: 17053, Severity: 16, State: 1.
    2013-06-19 10:39:44.07 spid1s      SQLServerLogMgr::LogWriter: Operating system error 170(The requested resource is in use.) encountered.
    2013-06-19 10:39:44.07 spid1s      Write error during log flush.
    2013-06-19 10:39:44.07 spid55      Error: 9001, Severity: 21, State: 4.
    2013-06-19 10:39:44.07 spid55      The log for database 'Admin' is not available. Check the event log for related error messages. Resolve any errors and restart the database.
    2013-06-19 10:39:44.07 spid55      Database Admin was shutdown due to error 9001 in routine 'XdesRMFull::CommitInternal'. Restart for non-snapshot databases will be attempted after all connections to the database are aborted.
    2013-06-19 10:39:44.31 spid36s     Error: 17053, Severity: 16, State: 1.
    2013-06-19 10:39:44.31 spid36s     fcb::close-flush: Operating system error (null) encountered.
    2013-06-19 10:39:44.31 spid36s     Error: 17053, Severity: 16, State: 1.
    2013-06-19 10:39:44.31 spid36s     fcb::close-flush: Operating system error (null) encountered.
    2013-06-19 10:39:44.32 spid36s     Error: 17053, Severity: 16, State: 1.
    2013-06-19 10:39:44.32 spid36s     fcb::close-flush: Operating system error (null) encountered.
    2013-06-19 10:39:44.32 spid36s     Error: 17053, Severity: 16, State: 1.
    2013-06-19 10:39:44.32 spid36s     fcb::close-flush: Operating system error (null) encountered.
    2013-06-19 10:39:44.33 spid36s     Starting up database 'Admin'.
    2013-06-19 10:39:44.58 spid36s     349 transactions rolled forward in database 'Admin' (6:0). This is an informational message only. No user action is required.
    2013-06-19 10:39:44.58 spid36s     SQLServerLogMgr::FixupLogTail (failure): alignBuf 0x000000001A75D000, writeSize 0x400, filePos 0x156adc00
    2013-06-19 10:39:44.58 spid36s     blankSize 0x3c0000, blkOffset 0x1056e, fileSeqNo 1313, totBytesWritten 0x0
    2013-06-19 10:39:44.58 spid36s     fcb status 0x42, handle 0x0000000000000BC0, size 262144 pages
    2013-06-19 10:39:44.58 spid36s     Error: 17053, Severity: 16, State: 1.
    2013-06-19 10:39:44.58 spid36s     SQLServerLogMgr::FixupLogTail: Operating system error 170(The requested resource is in use.) encountered.
    2013-06-19 10:39:44.58 spid36s     Error: 5159, Severity: 24, State: 13.
    2013-06-19 10:39:44.58 spid36s     Operating system error 170(The requested resource is in use.) on file "v:\MSSQL\log\Admin\Log.ldf" during FixupLogTail.
    2013-06-19 10:39:44.58 spid36s     Error: 3414, Severity: 21, State: 1.
    2013-06-19 10:39:44.58 spid36s     An error occurred during recovery, preventing the database 'Admin' (6:0) from restarting. Diagnose the recovery errors and fix them, or restore from a known good backup. If errors are not corrected or expected, contact Technical Support.

    In windows system log I see a lot of warnings like this:

    - <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
    - <System>
      <Provider Name="Microsoft-Windows-Ntfs" Guid="{3FF37A1C-A68D-4D6E-8C9B-F79E8B16C482}" />
      <EventID>140</EventID>
      <Version>0</Version>
      <Level>3</Level>
      <Task>0</Task>
      <Opcode>0</Opcode>
      <Keywords>0x8000000000000008</Keywords>
      <TimeCreated SystemTime="2013-06-19T06:39:44.314400200Z" />
      <EventRecordID>25239</EventRecordID>
      <Correlation />
      <Execution ProcessID="4620" ThreadID="4284" />
      <Channel>System</Channel>
      <Computer>sql-node-5.local.net</Computer>
      <Security UserID="S-1-5-21-796845957-515967899-725345543-17066" />
      </System>
    - <EventData>
      <Data Name="VolumeId">\\?\Volume{752f0849-6201-48e9-8821-7db897a10305}</Data>
      <Data Name="DeviceName">\Device\HarddiskVolume70</Data>
      <Data Name="Error">0x80000011</Data>
      </EventData>

     </Event>

    The system failed to flush data to the transaction log. Corruption may occur in VolumeId: \\?\Volume{752f0849-6201-48e9-8821-7db897a10305}, DeviceName: \Device\HarddiskVolume70.

    ({Device Busy}

    The device is currently busy.)


    There aren't any error or warning in HV hosts.
    Wednesday, June 19, 2013 6:57 AM

Answers

All replies

  • Is there a detailed explanation of guest FibreChannel in Hyper-V 2012?

    Friday, June 21, 2013 9:06 AM
  • Hello,

    I am trying to involve someone more familiar with this topic for a further look at this issue. Sometime delay might be expected from the job transferring. Your patience is greatly appreciated.
    Thank you for your understanding and support.

    Regards,
    Fanny Liu

    If you have any feedback on our support, please click  here.


    Fanny Liu
    TechNet Community Support

    Thursday, June 27, 2013 1:23 AM
    Moderator
  • Some technical details:
    HBA QLogic 8Gb CIOv p/n 44X1947, last firmware, driver version 9.1.11.20
    DS5300 FW version 07.84.46.00
    IBM DS MPIO version SMIA-WinX64-01.03.1305.0050
    Hosts' type on DS5300: W2KALUA
    OS updates were installed: 2565063, 2737084, 2742614, 2750149, 2753842, 2757638, 2761465, 2765809, 2769034, 2769165, 2771431, 2772501, 2785220, 2789649, 2790655,
    2795944, 2798162, 2798897, 2800033, 2803748, 2804583, 2805222, 2805227, 2805966,
    2807986, 2808735, 2811660, 2820330, 2822241, 2829254, 2829361, 2830290, 2836988
    Thursday, June 27, 2013 8:53 AM
  • Guest Fibre channel in Hyper-V 3.0 can be tricky for live migration.  Here is a quick overview:

    All guests spin up on the "A" side of a dual-channel network.  When you Live Migrate a VM using virtual Fibre Channel adapters, the migration target VM spins up on the "B" side.  At cutover, the MPIO driver handles any missed/duplicate packet issues.  Neat way to reuse code.

    This means you have to have a fully functional dual-channel Fibre setup.  Lots of small shops just use a single HBA and one channel, relying on host clustering to provide redundancy. 


    Geoff N. Hiten Principal Consultant Microsoft SQL Server MVP

    Thursday, June 27, 2013 7:57 PM
    Moderator
  • There are dual-port HBA on all our systems

    Saturday, June 29, 2013 12:02 PM
  • Hello,

    The errors for the SQL relates to the accessibility of database files. Try to move the system database to a new location on the server to see it the SQL starts up as expected. It this helps you can move other user database to the new location.

    Move System Databases

    _http://msdn.microsoft.com/en-us/library/ms345408.aspx


    Wednesday, July 24, 2013 3:38 PM
  • Errors occurred during VM live migration. Not during SQL startup.
    Monday, August 12, 2013 5:37 AM
  • As this is likely to be a low-level issue between vFC adapters, MPIO, and Live Migration (with your hardware combination), you should consider raising a support call to Microsoft who will help you locate the exact cause of the problem.
    Tuesday, August 13, 2013 8:19 AM
  • Hello,

    I have exactly the same behavior.

    My configuration is nearly the same thing except we have an EMC SAN and HP proliant nodes.

    I have another guest VMs without SQL with Vhba and everything works fine.

    Did you find the cause of this problem?

    Tuesday, August 20, 2013 7:04 AM
  • did you ever figure this out? I'm experiencing the same problem.

    (as I side note, I have non-clustered virtuel sql servers with vHBAs that works just spended, also during live migration)


    dag øyvind godtfredsen

    Friday, October 11, 2013 7:49 PM
  • Tuesday, December 31, 2013 12:24 PM
  • Thanks!
    Worked for me, finally!


    dag øyvind godtfredsen

    Thursday, January 02, 2014 7:36 AM
  • Thanks. Issue had been gone.
    Thursday, January 02, 2014 9:22 AM