locked
Is Microsoft intentionally breaching their Virtual Machine SLA? RRS feed

  • Question

  • I've just responded to this thread on our inability to delete virtual disks after a virtual machines were deleted.  In our case, we started two VMs in an availability set in accordance with the SLA. (Actually 2 sets of servers in 2 availability sets and they're all broken now).

    The VMs initially started okay, then appear to have restarted themselves and were stuck on a status of "Starting" for well over an hour.  Apparently that is a known problem and the best solution is delete the VM and recreate it using the underlying disk/vhd.

    That's precisely what we did.  Now we can't recreate the VMs because the underlying disk is still attached to deleted VMs.

    I've read various description regarding lease blobs not being released. Apparently others have experienced similar problems over the past year or two.  They're all very nice solutions but the simple reality is THIS IS MICROSOFT FAULT and MICROSOFT SHOULD FIX IT because MICROSOFT IS BREACHING THEIR SLA.

    Right now, our recommendation is to avoid Azure for production environments. It's proving to be too unreliable and it seems that their SLAs are nothing more than marketing hype.

    I have attempted to contact support but apparently we need to purchase technical support for their faulty product. As I pointed out to their billing team, this is a BILLING ISSUE.  If Microsoft is taking money from customers and not delivering the service they promised, it has become a billing issue.

    If someone out there has has similar experiences and/or actually read the SLAs, I'd love your input.  Rant over :)

    Regards,  Grant.

    Friday, November 22, 2013 4:57 AM

Answers

  • I ended up using a similar approach to workaround it.  Used CloudXplorer to break the lease, then moved the VHD to another container.  I didn't need to rename or delete it.  Moving seemed to do the trick.

    However, now I have dozens of dead virtual disks that claim to be attached to non-existent vms and blobs.

    I noticed the other comments regarding "Starting" process. I continued to have similar problems when I restored the VMs.  The trick seems to be use Azure Powershell:

    Stop-AzureVM -force   

    Forcing seems to reliably kill the VM while the UI doesn't.  That was the original reason I deleted the VM.  My suggestion is don't delete the VM under any circumstance. Use Stop-AzureVM -force.

    Ultimately, this workaround still doesn't resolve my concern with SLAs.  If I'm having to resort to such bizarre and time consuming methods, then my services are offline for unacceptable periods of time and Azure is below the 99.5% vm SLA and some refunds are due to a lot of people.
    • Marked as answer by GrantPH2 Saturday, November 23, 2013 12:09 AM
    • Unmarked as answer by GrantPH2 Saturday, November 23, 2013 12:46 AM
    • Marked as answer by GrantPH2 Saturday, December 7, 2013 12:52 AM
    Friday, November 22, 2013 11:42 PM
  • I'm still waiting for our Operations Team to know the best way to unlock these disks, ie. if you can do it yourself, or if you need to open a Service Request so we manually delete them.

    Yes, Stopping/Starting the 'Starting' VM is the quickest solution for now, while our Engineering teams fix the issue. As per the Azure Dashboard:

    23 Nov 2013  7:01 AM UTC We have mitigated the root cause of this incident. Microsoft Support continues to work with impacted customers and is resolving the issue. Any customer with a Virtual Machine stuck at 'Starting' can perform a Stop/Start using PowerShell to recover their instance. Instructions can be found by going to Bing, and searching "Manage Virtual Machines Using Windows Azure Cmdlets". We will provide an update within 6 hours.

     

    Regarding a refund, please open a Billing Service Request with us at https://manage.windowsazure.com/support. 

    Thanks. 

     

     

    • Marked as answer by Vivian_Wang Monday, December 2, 2013 2:23 AM
    Saturday, November 23, 2013 8:28 AM

All replies

  • You would have to manually break the lease using a third-party tool like CloudXplorer.

    This posting is provided AS IS, with no warranties, and confers no rights.

    • Proposed as answer by Pradeep M G Friday, November 22, 2013 5:03 AM
    • Unproposed as answer by GrantPH2 Friday, November 22, 2013 5:23 AM
    Friday, November 22, 2013 5:03 AM
  • I just saw your other thread and tried that.  It did not work.  Powershell Remove-AzureDisk still reports disk is in use after breaking all leases in CloudXplorer.

    More importantly, I shouldn't have to go to a third party tool to fix this problem.  Our services are unavailable and clearly I'm not happy about it.  Microsoft, PLEASE FIX IT.


    • Edited by GrantPH2 Friday, November 22, 2013 5:24 AM
    Friday, November 22, 2013 5:23 AM
  • Hi,

    I had the exact same problem yesterday. To "solve" it, I used cloudxplorer and renamed the VHDs. The operation will fail, but it will create a copy of the VHD which you can then use to create new disks. (You can probably just copy the disk as well)

    Don't know if I actually recommend this though. My new machines initially booted up OK, but then proceeded rather quickly to go back to the "starting" state again. So now I have duplicate disks and duplicate VHDs in my storage account, and still no running VMs.

    Friday, November 22, 2013 10:11 AM
  • @SPGrinder

    Same exact issue as you resulting in hours spent figuring out the workarounds only to wind up with;

    - A working VM for a few moments.

    - Tons or redundant storage (I am using terabytes for my task).

    Having a years of experience with AWS and only recently putting my toes in the water with Azure I am left wondering if anyone runs production environments here because it looks to unstable even for simple dev/test.

    --- Update ---

    I finally got my VM up and running after a shutdown and start from powershell.  However, I don't know if it was issues on the MS side clearing up or my powershell commands since I had already tried restart from powershell, stop/start from the dashboard, etc many times.  It might have been issues (which I never saw reported in their status dashboard) clearing up on their own since other problems like being unable to manage endpoints seem to be resolved.

    • Edited by decostop Friday, November 22, 2013 3:44 PM
    Friday, November 22, 2013 3:40 PM
  • Thanks for reporting this issue. The Azure Dashboard mentions this service interruption and our Operations Team are working to repair it.

    Regarding the reuse of your VHD, can you please follow Craig's post to unblock you?

    http://social.msdn.microsoft.com/Forums/windowsazure/en-US/7381ea0e-0443-4b33-aa12-ba39df003409/error-deleting-vhd-there-is-currently-a-lease-on-the-blob-and-no-lease-id-was-specified-in-the?forum=WAVirtualMachinesforWindowsVM

    Friday, November 22, 2013 4:13 PM
  • I ended up using a similar approach to workaround it.  Used CloudXplorer to break the lease, then moved the VHD to another container.  I didn't need to rename or delete it.  Moving seemed to do the trick.

    However, now I have dozens of dead virtual disks that claim to be attached to non-existent vms and blobs.

    I noticed the other comments regarding "Starting" process. I continued to have similar problems when I restored the VMs.  The trick seems to be use Azure Powershell:

    Stop-AzureVM -force   

    Forcing seems to reliably kill the VM while the UI doesn't.  That was the original reason I deleted the VM.  My suggestion is don't delete the VM under any circumstance. Use Stop-AzureVM -force.

    Ultimately, this workaround still doesn't resolve my concern with SLAs.  If I'm having to resort to such bizarre and time consuming methods, then my services are offline for unacceptable periods of time and Azure is below the 99.5% vm SLA and some refunds are due to a lot of people.
    • Marked as answer by GrantPH2 Saturday, November 23, 2013 12:09 AM
    • Unmarked as answer by GrantPH2 Saturday, November 23, 2013 12:46 AM
    • Marked as answer by GrantPH2 Saturday, December 7, 2013 12:52 AM
    Friday, November 22, 2013 11:42 PM
  • I'm still waiting for our Operations Team to know the best way to unlock these disks, ie. if you can do it yourself, or if you need to open a Service Request so we manually delete them.

    Yes, Stopping/Starting the 'Starting' VM is the quickest solution for now, while our Engineering teams fix the issue. As per the Azure Dashboard:

    23 Nov 2013  7:01 AM UTC We have mitigated the root cause of this incident. Microsoft Support continues to work with impacted customers and is resolving the issue. Any customer with a Virtual Machine stuck at 'Starting' can perform a Stop/Start using PowerShell to recover their instance. Instructions can be found by going to Bing, and searching "Manage Virtual Machines Using Windows Azure Cmdlets". We will provide an update within 6 hours.

     

    Regarding a refund, please open a Billing Service Request with us at https://manage.windowsazure.com/support. 

    Thanks. 

     

     

    • Marked as answer by Vivian_Wang Monday, December 2, 2013 2:23 AM
    Saturday, November 23, 2013 8:28 AM