locked
Linux VM - root volume changed to Read-only after sometimes, reboot solve the issue RRS feed

  • Question

  • We've a Linux instance and the root volume would automatically changed to readonly after the machine is running for some time (This is the 3rd times) , and reboot the system solve the issue, but it will appear again.

    Test:

    echo "" > /tmp/test

    -bash: /tmp/test: Read-only file system

    And in the syslog (Ubuntu 12.04)

    Jul 11 04:15:23 localhost kernel: [7470527.963527] JBD2: Detected IO errors while flushing file data on sda1-8
    
    Jul 11 04:15:23 localhost kernel: [7470528.011196] JBD2: Detected IO errors while flushing file data on sda1-8
    
    Jul 11 04:15:23 localhost kernel: [7470528.041984] JBD2: Detected IO errors while flushing file data on sda1-8
    
    Jul 11 04:15:23 localhost kernel: [7470528.108600] JBD2: Detected IO errors while flushing file data on sda1-8
    
    Jul 11 04:15:23 localhost kernel: [7470528.143191] JBD2: Detected IO errors while flushing file data on sda1-8
    
    Jul 11 04:15:23 localhost kernel: [7470528.244366] JBD2: Detected IO errors while flushing file data on sda1-8
    
    Jul 11 04:15:24 localhost kernel: [7470529.276309] JBD2: Detected IO errors while flushing file data on sda1-8
    
    Jul 11 04:17:30 localhost kernel: [7470655.671781] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4
    
    Jul 11 04:17:59 localhost kernel: [7470684.066669] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4
    
    Jul 11 04:17:59 localhost kernel: [7470684.066679] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4
    
    Jul 11 04:17:59 localhost kernel: [7470684.066682] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4
    
    Jul 11 04:17:59 localhost kernel: [7470684.066686] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4
    
    Jul 11 04:17:59 localhost kernel: [7470684.066689] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4
    
    Jul 11 04:17:59 localhost kernel: [7470684.066692] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4
    
    Jul 11 04:17:59 localhost kernel: [7470684.066695] hv_storvsc vmbus_0_1: cmd 0x2a scsi status 0x2 srb status 0x4
    
    

    It seems the hardware is having issue, anyway, how to solve?

    Besides, are there any way to get faster support in this case? (It is because your hardware is having issue..)




    • Edited by STHK Saturday, July 12, 2014 6:13 AM
    Saturday, July 12, 2014 6:11 AM

Answers

  • Hi STHK,

    This issue is happening because a not expected behaviour that is fixed and the fix will be deployed very son. The solution right now is restart the VM.

    Sergio

    Tuesday, July 29, 2014 11:39 AM

All replies

  • If you suspect a hardware issue one easy fix is to change the instance size - this forces an automated migration to a new physical host. If that resolves the issue you can then change it back.
    Sunday, July 13, 2014 5:42 AM
  • @Neil

    I suspect it is the hardware issue of the underlying page blob storage so I wonder if I migrate my instance into another physical machine will work, unless I copy the whole disk into another new disk.

    I am still looking to receive official answer from azure as there are a few similar questions in this forum already. I am looking for the cause, not just workaround.

    Thanks ayway.

    Sunday, July 13, 2014 9:30 AM
  • Hi,

    Thank you for your question.

    I am trying to involve someone familiar with this topic to further look at this issue. There might be some time delay. Appreciate your patience.

    Thank you for your understanding and support.

    Best regards,

    Susie

    Wednesday, July 16, 2014 6:32 AM
  • Good morning STHK,

    As you say, seems a hardware or an inconsistant file system.

    Fabric controller should automatically solve hardware errors, but it is posible that the file system is still having inconsistance data.

    What I propose you is to delete the virtual machine KEEPING the disks, attach and mount the disks to a new Linux virtual machine and run fsck command to check the integrity of the attached disks.

    After this, you can create a new Linux VM from the disks and you will have the same VM as the begining.

    Hope that this information will help you,

    Sergio

    • Proposed as answer by Susie Long Monday, July 28, 2014 2:15 AM
    Wednesday, July 23, 2014 7:46 AM
  • Hi STHK,

    This issue is happening because a not expected behaviour that is fixed and the fix will be deployed very son. The solution right now is restart the VM.

    Sergio

    Tuesday, July 29, 2014 11:39 AM
  • Hi, thanks for your reply.

    I hope you will update us when you deployed the fix, because we are still having this issue today (2/Aug) and the only way to get rid of the problem is to reboot our VM.

    Thanks.

    Saturday, August 2, 2014 4:41 AM
  • I wouldnt bank on them fixing it 'soon'.

    My machine has been like this continuously for the past 3 days, unusable, and no response from them to my post at all.  Rebooting does not help in my case.

    http://social.msdn.microsoft.com/Forums/windowsazure/en-US/51bd9ec5-c6de-4109-b1ec-7280a89bae33/vm-stopped-responding-disk-io-causing-freeze?forum=WAVirtualMachinesforWindows

    I found a post  that was started 2 years ago, that was just like this, the MS guy started communicating about it and then never got back to anyone in the thread, even though people have been asking for a reply and status.

    http://social.msdn.microsoft.com/Forums/windowsazure/en-US/cae5d9d5-65a3-41b7-83d6-3cc24c418c18/vm-becomes-unresponsive-or-disk-becomes-readonly?forum=WAVirtualMachinesforWindows

    I dont think they know how or want to fix the problem :(

    Rob.



    • Edited by Rob Donovan Saturday, August 2, 2014 7:12 PM
    Saturday, August 2, 2014 6:57 PM
  • One of my machine just went down this morning and woke me up from sleep (again).

    For those who are reading this thread and experiencing the same issue, even you can ssh into your machine sometimes, don't RESTART. You need to shutdown and start again from azure console in order to migrate into another host, that fix the problem. Obviously, it is the azure responsiblity on the issue.

    We've 10+ machines (Linux VM) and at least one or two machines will have this issue every week, yes, every week, I have no idea what their SLA is.

    Wednesday, August 27, 2014 3:50 AM