none
VM becomes unresponsive or disk becomes read-only RRS feed

  • General discussion

  • If your Windows VM has become unresponsive or your disk becomes read-only, attempt to restart the VM through the portal or through the scripts. In many cases, this will resolve the issue and the VM will return to healthy.

    However, if restarting your VM gets stuck in "stopping" indefinately, you can workaround this issue by removing the virtual machine (hitting delete in the portal) and re-deploying it using the same VHD disk that you used. You will also need to re-attach any data disks you were using as well. This will appear like a hard "reboot" to the Virtual Machine. This re-creation of the VM can be accomplished by going through the create VM process in the portal, selecting gallery, and from there selecting "My Disks."

    Additionally, if you want to use the same DNS name, you will also need to delete the respective cloud service in the portal, along with the virtual machine, to release that previous DNS name reserved.

    You can also import/export using powershell capabilities (which also saves aside the ports added and the data disks attached). Details on this can be found here:

    Download the PowerShell Tools here:https://www.windowsazure.com/en-us/manage/downloads/

    Documentation on configuring PowerShell can be found here:http://msdn.microsoft.com/en-us/library/windowsazure/jj156055.aspx

    Import/Export:

    To export:

    Export-AzureVM -ServiceName 'ServiceName' -Name 'VMName'-Path 'c:\folder\VM.xml'

    Then, remove the VM:

    Remove-AzureVM -ServiceName 'ServiceName' -Name 'VMName'

    To import:

    Import-AzureVM -Path 'c:\folder\VM.xml' |
    >> New-AzureVM -ServiceName 'ServiceName' -Location 'loc'

    More Details on using the import/export functionality, try this blog post:http://michaelwasham.com/2012/06/18/importing-and-exporting-virtual-machine-settings/

    Corey


    Saturday, July 14, 2012 1:13 AM
    Moderator

All replies

  • Corey,

    Any news related with this issue? We had to delete and re-create one of our VM four times last week because the file system crashed.

    Thanks,

    Mariano

    Wednesday, July 18, 2012 3:54 PM
  • We're experiencing the same issue as well. This is a roadblock for us. Any news/updates would be appreciated.

    Thanks,

    Evan

    Thursday, July 19, 2012 6:23 PM
  • Any update? I experience the same problem. Another problem I encounter is that one of the vm I used a main server reboot every days in 3 am. I already disable automatic update. Anyone have any solution?

    Thanks

    Siqi

    Saturday, July 21, 2012 3:34 PM
  • i've had to reboot my vm every day for the past 4 days. this is getting ridiculous.
    Saturday, July 21, 2012 9:57 PM
  • We sincerely apologize for the difficulties caused by this problem. We are racing to deploy an update to the platform to resolve the issue, hopefully right around the end of the month.

    We will confirm our status as we get closer.

    Corey

    Tuesday, July 24, 2012 12:08 AM
    Moderator
  • Cory, Can you share what causes this to happen?  How does this relate to data stored on an attached disk?  Are there any items that trigger this disk failure?  We noticed it happen with shared networking (backend subnet) and attached storage.  The problem is, the VMs usually cannot be stopped, let alone re-sized, or deleted.   

    Thursday, July 26, 2012 5:07 PM
  • Hey Frank,

    There is nothing special that will cause this to happen although we have seen it more frequent on instances that are heavily using their disks for lots of reads/writes.

    As an update, we are currently deploying the update that should resolve this issue and it should be reaching most production regions by the end of this week.

    Corey

    Monday, July 30, 2012 5:20 AM
    Moderator
  • after about a week with issues, our vm became unresponsive again. this time i was able to connect by rdp, but after entering my credentials from the windows security dialog, the rdp window came up with a blank screen. i could see the windows 2008 r2 data center logo, but windows desktop never came up. it was just stuck like that. i ended up having to restart the vm like in prior incidents.
    Thursday, August 2, 2012 2:35 AM
  • Hi Corey,

    Has the update been deployed? Should expect this to be fixed?

    Thanks,

    Evan

    Monday, August 6, 2012 4:07 PM
  • Just an FYI, I have a trial account and a test server running which is doing absolutely nothing all day (so no high CPU/network usage whatsoever). On August 3rd it was down for 22 hours (I wasn't able to get to a computer to hit the reboot button).

    Also rebooting takes an extremely long time, if it goes down then it's usually at least for an hour.

    I don't know if and when the update is available but I'll keep monitoring.

    Tuesday, August 7, 2012 1:43 PM
  • Corey,

    After not seeing this problem for a few weeks, I ran into it again yesterday.  It happened when testing Linux Mailservers.  

    Has the fix been deployed on all azure deployments?  Is there a way we can manually check?  Do old VM's need to be taken offline (restarted, re built from disk) for this fix to take place?

    I know azure VM is in preview, but its something that we are Paying for.  We plan on using it for production services very soon, and are worried that a disk getting stuck with cause major issues.

    Last, is there a bash script or solution we can run every few minutes to restart the vm instance in the event that it does become read only?

    Thursday, August 16, 2012 4:47 AM
  • Hi Frank,

    Can you send the following to iaasforum@microsoft.com?

    • Subscription ID
    • Name and Deployment ID of VM where disk went read-only.
    • Date/time when problem started/was first discovered.
    • Type of workload being run on VM during that timeframe
    • Link to this forum thread

    Thanks,
    Craig

    Friday, August 17, 2012 6:23 PM
    Moderator
  • Craig, I sent the info you requested. 
    Email is from frankbasti@outlook.com 

    The issue has repeated itself twice now, after not really affecting the machine for a couple weeks.   Only thing that is new is that access the machine (ubuntu 12.04 mail server) via webmail interface.  The webmail interface polls imap accounts every minute.  There is VERY LITTLE data with only a few test accounts. 
    Friday, August 17, 2012 9:07 PM
  • I did not see a response to the posts asking if the fix had been deployed. I assumed it had so I went ahead and recreated one of the VMs I had used previously. This issue occurred again as Frank has outlined. This problem only seems to happen on Linux VMs from what I can tell.
    Wednesday, August 22, 2012 7:09 PM
  • I have the same problem. Happened two times today on a SQL Server 2012 VM.
    Thursday, August 23, 2012 12:05 AM
  • Corey,

    Any news about the issue?

    We are currently experiencing this issue in our Ubuntu production machine. Most of the times a restart didn't solve the problem so we have to recreate the VPS and we have a long downtime.

    Thanks,

    Giannis

    Monday, August 27, 2012 11:40 AM
  • Had high hopes as it hadn't gone down since my last reply. So I moved my site (100 visitors a day, in no way high traffic) over yesterday. Now it has been down for 2 hours already and I dare not touch it. 

    So yeah, I'm eager for some news as well. I am moving back to my old host and won't start paying after the trial period because this (and the non-replies) is unacceptable to pay for, even if it's just a "preview" product.

    Monday, August 27, 2012 11:52 AM
  • Update: My whole virtual machine is now just completely gone from the management portal. Nothing to see there any more. Zip. Nada. "You have no virtual machines. Create one to get started!".

    Links to the disks don't work any more either, so all data also seems gone. Lovely.

    Thanks for nothing!


    Monday, August 27, 2012 11:56 AM
  • Same situation day by day. Anything news?

    Aug 17 23:04:38 comdi-conman kernel: [40028.916231] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4

    Aug 17 23:04:38 comdi-conman kernel: [40028.916241] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4

    Aug 17 23:04:38 comdi-conman kernel: [40028.916250] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4

    Aug 17 23:04:38 comdi-conman kernel: [40028.916257] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4

    Aug 17 23:04:38 comdi-conman kernel: [40028.916266] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4

    Aug 17 23:04:48 comdi-conman kernel: [40038.930363] sd 0:0:0:0: [sda] Unhandled error code

    Aug 17 23:04:48 comdi-conman kernel: [40038.930373] sd 0:0:0:0: [sda]  Result: hostbyte=invalid driverbyte=DRIVER_OK

    Aug 17 23:04:48 comdi-conman kernel: [40038.930382] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 51 be d9 00 00 20 00

    Aug 17 23:04:48 comdi-conman kernel: [40038.930402] end_request: critical target error, dev sda, sector 5357273

    Aug 17 23:04:48 comdi-conman kernel: [40038.933113] Buffer I/O error on device sda1, logical block 667651

    Aug 17 23:04:48 comdi-conman kernel: [40038.935230] Buffer I/O error on device sda1, logical block 667652

    Aug 17 23:04:48 comdi-conman kernel: [40038.938447] Buffer I/O error on device sda1, logical block 667653

    Aug 17 23:04:48 comdi-conman kernel: [40038.942489] Buffer I/O error on device sda1, logical block 667654

    Aug 17 23:04:48 comdi-conman kernel: [40038.945001] EXT4-fs warning (device sda1): ext4_end_bio:251: I/O error writing to inode 136792 (offset 0 size 16384 starting block 669663)

    Aug 17 23:04:48 comdi-conman kernel: [40038.945047] sd 0:0:0:0: [sda]  Sense Key : No Sense [current] 

    Aug 17 23:04:48 comdi-conman kernel: [40038.945058] sd 0:0:0:0: [sda]  Add. Sense: No additional sense information

    Aug 17 23:04:48 comdi-conman kernel: [40038.945081] sd 0:0:0:0: [sda]  Sense Key : No Sense [current] 

    Aug 17 23:04:48 comdi-conman kernel: [40038.945090] sd 0:0:0:0: [sda]  Add. Sense: No additional sense information

    Aug 17 23:04:48 comdi-conman kernel: [40038.945104] sd 0:0:0:0: [sda]  Sense Key : No Sense [current] 

    Aug 17 23:04:48 comdi-conman kernel: [40038.945112] sd 0:0:0:0: [sda]  Add. Sense: No additional sense information

    Aug 17 23:04:48 comdi-conman kernel: [40038.945126] sd 0:0:0:0: [sda]  Sense Key : No Sense [current] 

    Aug 17 23:04:48 comdi-conman kernel: [40038.945134] sd 0:0:0:0: [sda]  Add. Sense: No additional sense information

    Aug 17 23:05:53 comdi-conman kernel: [40103.952416] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4

    Aug 17 23:05:53 comdi-conman kernel: [40103.952440] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4

    Aug 17 23:05:53 comdi-conman kernel: [40103.952449] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4

    Monday, August 27, 2012 5:50 PM
  • Hey Frank,

    There is nothing special that will cause this to happen although we have seen it more frequent on instances that are heavily using their disks for lots of reads/writes.

    As an update, we are currently deploying the update that should resolve this issue and it should be reaching most production regions by the end of this week.

    Corey

    Is there any update on this?  We're seeing a lot of unresponsive VMs.  2 yesterday, 3 so far today for example.

    Rob.

    Thursday, August 30, 2012 6:43 AM
  • Hey Frank,

    There is nothing special that will cause this to happen although we have seen it more frequent on instances that are heavily using their disks for lots of reads/writes.

    As an update, we are currently deploying the update that should resolve this issue and it should be reaching most production regions by the end of this week.

    Corey

    Hi Frank,

    as I can see Azure VMs are still experiencing the same issues... This is a quite annoying situation especially because there are a lot of people with exactly the same problem (i.e., the RO filesystem...) with and no answer! This is definitely not fair, especially if you have a bizSpark or a billing-based Azure subscription.

    Could you please let me know something?

    Kind regards, Andrea 

    Friday, August 31, 2012 10:55 AM
  • For me the issue has not come up since I reported it to Microsoft a couple weeks ago.  I have no idea if it is just a fluke or if something has been modified on my account.   It would be nice if Microsoft kept all of us in the loop on exactly what is going on with the issue, best practices to avoid the issue.  

    The only thing I have done different since reporting the issue is not really use the Azure Portal. 
    Hopefully this gets fully fixed soon.  Not having the ability to run Linux VM's without worrying about them locking up is a showstopper for many of us. 

    Friday, August 31, 2012 9:32 PM

  • Just want to update that the issue is still exist in Ubuntu VM. I've been experiencing this issue for my 8 VMs at least once a week for each VMs. And for VM that has higher load IO reading has this issue more frequent.

    Moving those 8 production VMs  to other solutions is not an option as the upper management decided to go the whole way with Azure VMs. 

    Can anyone from MS update the status of this issue? What to expect within couple of weeks forward? and when is the estimated time this will be solved ? Since,honestly, we need to plan ahead as our business depends on it.

    Thank you

    Saturday, September 1, 2012 3:41 PM
  • The similar issue with hanging VM (Windows Server 2012 August image) just started happening to me today. Before I used this VM for 2 weeks with 100% uptime. Really got concerned if I can rely now on VMs. Hoping you guys can fix the entire issue with VMs. 

    Monday, September 3, 2012 7:43 PM
  • This is happening on my CentOS VM as well.  After about 12 hours the disk went read only any my node app failed because it cannot write logs.  Any update on this fix MS team?
    Monday, September 10, 2012 2:04 PM
  • Any news? This has happened 3 times in the past 4 days to me. I know this is a beta product and does not provide an SLA but this seems like a massive, massive issue. The fact that we're here 2 months later with no fix? What's the deal?

    Travis Bell

    Monday, September 10, 2012 3:55 PM
  • We are hitting this issue also. Any update?
    Monday, September 10, 2012 4:32 PM
  • Sept. 11 @ 9:15 AM

    > travisbell@web1:~$ touch test.txt
    > touch: cannot touch `test.txt': Read-only file system


    Travis Bell

    Tuesday, September 11, 2012 3:24 PM
  • We are also experiencing this issue. We are using a SQL 2012 Virtual Machine. On two occasions (as recent as today - 2012/09/12), the VM has become unresponsive. We are unable to RDP to the VM or access the SQL Server. The only solution we have found is to restart the VM. Once the VM restarts, service returns to normal.

    There do not appear to be any events in the Application or System Event log that relate to the problem.

    Is this issue a known issue? I can send more details if required.

    Thanks,
    Richard

     
    Wednesday, September 12, 2012 11:33 PM
  • We just tried recreating our vm based on the steps described in the first post. Let's hope the issue goes away
    Thursday, September 13, 2012 7:23 PM
  • Hi Corey,

    Greets from Russia. Unfortunately we're having this problem, too.

    We migrated our production servers for the SaaS from Amazon to Azure and since this week are starting to have this ridiculous problems. Please provide an update on when you'll resolve the issue with the storage...

    The problem seems to go away with the restart from the interface... But it's not a 'production' solution, isn't it?
    • Edited by Ilya Pan. _ Sunday, September 16, 2012 8:45 AM
    Sunday, September 16, 2012 8:35 AM
  • Same problem here, happened at least 4 times on a production web server. We are thinking to move away from Azure.
    Sunday, September 16, 2012 10:26 PM
  • Yes, we had not been using a demo system for a bit (yozons.cloudapp.net), but when I returned, noted the read-only file system. Fortunately, a reboot through the portal resolved it, but it will have to be resolved before anybody can actually deploy anything of value. It seems unlikely there was much high I/O other than perhaps performing a database backup as is done on all of our production deployments on other VMs.

    Hope a solution is to be found!

    Monday, September 17, 2012 5:57 PM
  • Same thing happened just now on my W2k8 r2 VM running IIS.

    It just hangs, I can login via RDP but gets stuck on the "Welcome..." message after putting in the credentials.

    There's nothing in the logs (event viewer), pingdom reported down since 17-09-2012 21:16:19 GMT +1.

    It's happened twice the past 4 days, but today's only a reboot fixed it! Any idea what the root cause could be?

    Thanks,

    P.

    Monday, September 17, 2012 9:17 PM
  • Now getting this problem every day... Any solution under way?
    Tuesday, September 18, 2012 11:21 AM
  • I have not run into this for several weeks now (since late September).  Was something changed in the backend that would have fixed it?
    Tuesday, October 16, 2012 12:27 AM
  • Ditto. Been good since my last post.

    Travis Bell

    Tuesday, October 16, 2012 12:35 AM
  • Well, we ran into the problem today as well. Read-only filesystem, and what is worse, we cant get it restarted or even shut down from portal, in order to reboot. Just keep receiving messages like 'Failed to shut down virtual machine xxx'
    • Edited by scorporg Friday, November 2, 2012 11:26 AM
    Friday, November 2, 2012 9:22 AM
  • This issue had stopped for a while but now it is happening several times a day on my machine. Any news on this issue?

    Thanks

    Wednesday, December 5, 2012 1:58 PM
  • Exact same problem happened to me twice to me on the same disk

    [  145.219250] hv_storvsc vmbus_0_13: cmd 0x28 scsi status 0x2 srb status 0x4
    [  195.295742] hv_storvsc vmbus_0_13: cmd 0x28 scsi status 0x2 srb status 0x4
    [  245.356813] hv_storvsc vmbus_0_13: cmd 0x28 scsi status 0x2 srb status 0x4
    [  295.417777] hv_storvsc vmbus_0_13: cmd 0x28 scsi status 0x2 srb status 0x4

    Tried the delete machine and recreate method, the problem re-appeared right away.

    A problem that has reported last year july with continuous "me too" replies never got resolved. And the official solution is remove the machine and recreate it... This must be a F**king joke.

    Monday, May 20, 2013 8:09 AM
  • Hi,

    I had similar issue (no connection, but billing ;) ),

    after a lot test/error we found that the problem was caused by a bad configuration in /etc/fstab.

    The solution was to create a temp virtual machine an add the disk (obviously you will have to remove the VM and the disk and then mount it but like data disk), then mount it in the new virtual machine and change the configuration. Then remove it from vm, remove it from disks and finally add it like OS disk and mount again the original VM :).

    I know probably this is not the cause of all problems mentioned above, but maybe this can help you and give some advice if you did recent changes to OS config.

    Regards

    Juan

    Friday, September 6, 2013 6:38 PM
  • Hi,

    We have the same issue.

    VM was running fine for almost a year.

    Suddenly filesystem started to become readonly.

    Is there any update on solution?

    Thursday, January 30, 2014 10:59 PM
  • Hi,

    I just got this exact problem, after struggling with it for 4 hours :(

    There seems to be no update as to how or if this problem was resolved on the MS side.

    Could someone please advise?

    I'm running Linux Ubuntu VM.

    Thanks, Rob Donovan.


    • Edited by Rob Donovan Thursday, July 31, 2014 8:12 PM
    Thursday, July 31, 2014 8:08 PM
  • Exact same problem happened to me twice to me on the same disk

    [  145.219250] hv_storvsc vmbus_0_13: cmd 0x28 scsi status 0x2 srb status 0x4
    [  195.295742] hv_storvsc vmbus_0_13: cmd 0x28 scsi status 0x2 srb status 0x4
    [  245.356813] hv_storvsc vmbus_0_13: cmd 0x28 scsi status 0x2 srb status 0x4
    [  295.417777] hv_storvsc vmbus_0_13: cmd 0x28 scsi status 0x2 srb status 0x4

    Tried the delete machine and recreate method, the problem re-appeared right away.

    A problem that has reported last year july with continuous "me too" replies never got resolved. And the official solution is remove the machine and recreate it... This must be a F**king joke.

    are you sure? even more recently have found a simple solution to reset VM?
    Wednesday, August 13, 2014 5:24 PM
  • Hello All,

    I also have this issue this morning. My customer ask me why and how to fix it. Now I know why but how to fix it? anyone know?

    Thanks

    Thursday, August 14, 2014 12:02 PM
  • Hello All,

    I had the same issue today. It seems Azure maintenance went into affect today. After whatever process they ran, my Centos server failed. It's running, but I can't ssh and the webserver is down. I've tried to reset several time, change the configuration, but no affect.

    Is there another way to access the Centos server through another UI?

    David

    Monday, September 15, 2014 3:37 AM
  • We have had the same issue, but ours occurred before that maintenance start time.

    Same issue with SLES machine: can not SSH in and unreponsive to it seems anything.

    Also downloaded VHD seems to be missing partitions which happen to be the data ones.

    Same vmbus errors in logs as others.

    Ran fine for over a year.

    Restarting or rebuilding with same disk no good, still no network out IO or disk IO and can't access it to see what is going on.

    Any help from Azure to all of us would be greatly appreciated.

    Thanks.

    Wednesday, September 17, 2014 5:53 PM

  • I encounter same issue today(2015-02-16).  

    I did 'stop-and-start' my instance. and the instance status is running now.

    but I can't access on ssh.

    Can I preserve my VHD(OS disk) with re-create process (export/import)? 



    Bob.



    Monday, February 16, 2015 3:21 AM
  • iaasforum@microsoft.com 

    is it available now? 
    Monday, February 16, 2015 4:59 AM
  • I've the same issue, with two servers Ubuntu 14.04 LTS in the last two days!!! 
    Anyone knows the problem for this issue?

    Friday, March 11, 2016 2:35 PM
  • Couple of days ago one of our Ubuntu 16.04 LTS server's root partition become read-only. Shutdown, restart, deallocate did not help, sda1 to / still mounted in read-only mode. I've reported it to Azure Support. They were really helpful. We figured it out that sda has got some errors. For now I don't know why.

    Here is how we fixed it:

    Repair the file system

    fsck /dev/sda1

    Remount the root partition with read-write option

    mount -o remount,rw /

    Check if it was success

    mount

    Our console messages

    root@yourserver:~# fsck /dev/sda1

    fsck from util-linux 2.27.1
    e2fsck 1.42.13 (17-May-2015)
    cloudimg-rootfs contains a file system with errors, check forced.
    Pass 1: Checking inodes, blocks, and sizes
    Deleted inode 2178 has zero dtime. Fix<y>? yes
    Inodes that were part of a corrupted orphan linked list found. Fix<y>? yes
    Inode 2179 was part of the orphaned inode list. FIXED.
    Inode 256325 was part of the orphaned inode list. FIXED.
    Inode 256327 was part of the orphaned inode list. FIXED.
    Inode 256329 was part of the orphaned inode list. FIXED.
    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    Block bitmap differences: -(46192--46873) -531081 -(6055714--6055716) -(6055720--6055726)
    Fix<y>? yes
    Free blocks count wrong for group #1 (3963, counted=4645).
    Fix<y>? yes
    Free blocks count wrong for group #16 (25664, counted=25665).
    Fix<y>? yes
    Free blocks count wrong for group #184 (15998, counted=16008).
    Fix<y>? yes
    Free blocks count wrong (4654888, counted=4655581).
    Fix<y>? yes
    Inode bitmap differences: -(2178--2179) -256325 -256327 -256329
    Fix<y>? yes
    Free inodes count wrong for group #0 (39, counted=41).
    Fix<y>? yes
    Free inodes count wrong for group #16 (3, counted=6).
    Fix<y>? yes
    Directories count wrong for group #16 (2275, counted=2274).
    Fix<y>? yes
    Free inodes count wrong (3060151, counted=3060156).
    Fix<y>? yes

    cloudimg-rootfs: ***** FILE SYSTEM WAS MODIFIED *****
    cloudimg-rootfs: ***** REBOOT LINUX *****
    cloudimg-rootfs: 699844/3760000 files (0.1% non-contiguous), 3024158/7679739 blocks

    root@yourserver:~# mount -o remount,rw /

    root@yourserver:~# mount

    ...
    /dev/sda1 on / type ext4 (rw,relatime,discard,data=ordered)

    Wednesday, February 15, 2017 10:00 AM