locked
AzureStack TP3 20170331.1 deployment completed but VMs are in Saved-Critical state RRS feed

  • Question

  • Hi,

    I installed a couple of time MAS TP3 published in February without any issues.

    A refresh of the TP3 has recently been announced and I am doing an installation of the new refresh of TP3 using the same Hardware server I used for previous TP3 build.

    Installation terminated successfully ( COMPLETE: Action 'Deployment').However, when I looked at the Hyper-V Manager, all TP3 VM appliances are in "Saved-critical' state with the exception of MAS-DC01 (running).

    Information from Failover Cluster Manager for these VM appliances displays the error: an error occurred for Resource Virtual machine <VM name>.

    I am going to try another installation. But I am wondering what could be reason for such issue?

    Best regards

    Denis

    Tuesday, April 11, 2017 9:39 AM

Answers

  • Hi,

    I actually have a platform with 5 disks (1 * 600 GB for base OS, 4 * 900 GB for Data for MAS) and 160 GB of RAM.

    I applied instructions from the FAQ to format the data disks before re-installation, and installation of TP3 failed each time with same behavior: VM appliances moved from 'running' to 'Saved-critical' during installation process.

    https://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-faq

    I eventually, re-applied the instructions from the FAQ to delete the storage pool from the bare-metal host. However, after step 6, I deleted the existing volumes for Disk 1,2,3,4 (using Computer Management --> Disk management utility) followed with a "Reset Disk" (Server manager --> File and Storage Services --> Volumes --> Disk, right-click the disk 1(2,3,4) and reset Disk)

    The "Reset Disk" action makes the disks "Not Initialized" which translates to "Raw" format (partition Style) in the MAS POC server ('Get-Disk' powershell command) before launching the powershell script InstallAzurePOC.ps1). This makes my server exactly as it was the first time I built it to install TP3 (February build) the first time.

    Re-installation of TP3 then succeeded.

     I now have TP3 Refresh build (version 1.0.170331.1) installed and all VM appliances are in "running" state.

    Best regards

    Denis

    Thursday, April 13, 2017 5:30 PM

All replies

  • You most likely ran out of space or out of available RAM. Also check your cluster and 'server manager ->file and storage services -> storage pools' for any errors. Check the free space with PS.

    get-virtualdisk | get-disk | Get-Partition |  get-volume
    


    Cheers,

    Ruud
    Twitter:    Blog: AzureStack.Blog  LinkedIn:    
    Note: Please “Vote As Helpful” if you find my contribution useful or “Mark As Answer” if it does answer your question. That will encourage me - and others - to take time out to help you.

    Tuesday, April 11, 2017 12:12 PM
  • Hi,

    I actually have a platform with 5 disks (1 * 600 GB for base OS, 4 * 900 GB for Data for MAS) and 160 GB of RAM.

    I applied instructions from the FAQ to format the data disks before re-installation, and installation of TP3 failed each time with same behavior: VM appliances moved from 'running' to 'Saved-critical' during installation process.

    https://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-faq

    I eventually, re-applied the instructions from the FAQ to delete the storage pool from the bare-metal host. However, after step 6, I deleted the existing volumes for Disk 1,2,3,4 (using Computer Management --> Disk management utility) followed with a "Reset Disk" (Server manager --> File and Storage Services --> Volumes --> Disk, right-click the disk 1(2,3,4) and reset Disk)

    The "Reset Disk" action makes the disks "Not Initialized" which translates to "Raw" format (partition Style) in the MAS POC server ('Get-Disk' powershell command) before launching the powershell script InstallAzurePOC.ps1). This makes my server exactly as it was the first time I built it to install TP3 (February build) the first time.

    Re-installation of TP3 then succeeded.

     I now have TP3 Refresh build (version 1.0.170331.1) installed and all VM appliances are in "running" state.

    Best regards

    Denis

    Thursday, April 13, 2017 5:30 PM
  • Awesome detailed feedback Dennis, glad you solved it. This will help a lot for future cases like this. I'm still curious why exactly the VM's were in a critical saved state, failed VD, storage pool? Because it did manage to finish the deployment. Be sure you check out the raid controller log for any harddisk faults.

    Cheers,

    Ruud
    Twitter:    Blog: AzureStack.Blog  LinkedIn:    
    Note: Please “Vote As Helpful” if you find my contribution useful or “Mark As Answer” if it does answer your question. That will encourage me - and others - to take time out to help you.

    Tuesday, April 18, 2017 8:59 AM
  • faced the same issue and reinstalling the stack after cleaning up didn't fixed the issue even though in TP3 the deployment script is actually cleaning up the disks.(https://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-redeploy)

    Tried adding all new disks (5 x 300 GB raid 0)and the issue appeared again in less than a day.but we noticed if we reboot the host all the vm's and services are coming up and staying up for a short amount of time before going down again.

    Then we installed into a new server with larger capacity disks (5 x 600 GB raid 0) in the same environment following the same set of scripts and config and the stack is still running without this issue.So we believe the issue coming from the hardware.

    if any one wants any logs for debugging I can get them as  I still have the old stack with vm's in saved-critical state. Looks like its trying to run some kind of restore...


    Friday, May 26, 2017 1:22 AM