locked
AzS-Xrp01 is always shutting down RRS feed

  • Question

  • Hello, in the Azure Stack Administrator Portal I got the alert that the Infrastructure Management Role is unhealthy and needs to be restarted. So I did that through the administrator portal. The first shock was that Azure Stack wasn't able to show any information after that (only Raincloud icons). The Hyper-V Manager and also the Failover Cluster Manager are showing that the machine/role AzS-Xrp01 is stopped (off). On https://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-monitor-health they say that this role needs to be started manually through the Hyper-V Manager on the host system if the Infrastructure Management Role was restarted through the Administrator Portal. So I tried that. First, it seems that the AzS-Xrp01 machine is starting correctly. After 4-5 minutes, the CPU Usage of this VM goes up to maybe 6-10 %. The VM will shut down automatically shortly after.

    I also tried to reboot the whole Azure Stack without success. All roles/VM's are starting correctly and only AzS-Xrp01 will not start. What can I try to get this role back working?

    Regards.

    Wednesday, July 12, 2017 11:19 AM

Answers

  • Hello Todd,

    There is a known issue with the Compute Controller VM becoming unstable after a Marketplace Syndication fails. It has been fixed in the upcoming release.  The only known mitigation at this point is to redeploy.

      

    Also, while rebooting the XRP VM or associated infra role is not supported for the ASDK, it is possible to initiate a graceful Shutdown, which initiates the appropriate steps, in the appropriate order. Once the ASDK is off, the Startup procedure is also orchestrated, but happens automatically at restart.

     

    The high level Shutdown procedure is as follows:

    1. Connect to the ERCS VM (via Hyper-V, or remote PowerShell session to the PrivilegedEndpoint); Login as the default (generated) account seen when attempting to connect to the VM via Hyper-V.

    2. Execute the Stop-AzureStack command; this will start the Shutdown process.

     

    About Rebooting the XRP VM from the admin portal or admin PS (or Hyper-V/PS) will leave the environment in an unstable state. These commands and remediation actions (you may see them as part of generated alerts) are meant for multi-node environments, and do not apply to the ASDK.

     

    NOTE: More information on connecting to the PrivilegedEndpoint can be found here: https://youtu.be/msoz1wyPj_0

     

    As always, we will keep the community up to date on the status of this issue.

    We apologize for any inconvenience and appreciate your time and interest in Azure Stack.

     

    If you experience any issues or have any question about the Azure Stack Development Kit, please feel free to contact us.

       

    Thanks,


    Gary Gallanes

    Wednesday, September 20, 2017 10:24 PM
  • Thanks Gary for your Email. Just to keep this thread updated:

    I’m sorry but i already redeployed Azure Stack. So the problem is fixed for me at the moment. But the strange thing is that right after successful redeploying Azure Stack the Infrastructure management controller is reporting an unhealty state. I haven’t restarted the Infrastructure Roles like last time as this crashed the whole Azure Stack with the XRP01 VM not working any longer. But how do I get back to an healty state for those Infrastructure Roles? I guess I will open up a new thread in the forum and will close my old thread.


    Wednesday, July 26, 2017 8:40 AM

All replies

  • That is kind of strange. Try to launch 'eventvwr.msc' on the host and connect to azs-xrp01.azurestack.local and see if you find anything in the system log.


    Cheers,

    Ruud
    Twitter:    Blog: AzureStack.Blog  LinkedIn:    
    Note: Please “Vote As Helpful” if you find my contribution useful or “Mark As Answer” if it does answer your question. That will encourage me - and others - to take time out to help you.

    Wednesday, July 12, 2017 12:18 PM
  • Warning The system failed to register host (A or AAAA) resource records (RRs) for network adapter
    Warning SSL Certificate Settings created by an admin process for endpoint : 0.0.0.0:<port> .
    Warning SSL Certificate Settings deleted for endpoint : 0.0.0.0:<port> .
    Critical The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
    Error The previous system shutdown at 2:11:01 PM on ‎7/‎12/‎2017 was unexpected.
    Warning The Security System has received an authentication request that could not be decoded. The request has failed.

    Those are the only Warnings, Criticals, Errors etc. I found in the Systemlog while the role/VM was running.

    Wednesday, July 12, 2017 1:04 PM
  • Check for enough resources (memory) and I presume this is a physical host? Also see if there are any Hyper-V events on the Host itself.

    Cheers,

    Ruud
    Twitter:    Blog: AzureStack.Blog  LinkedIn:    
    Note: Please “Vote As Helpful” if you find my contribution useful or “Mark As Answer” if it does answer your question. That will encourage me - and others - to take time out to help you.

    Wednesday, July 12, 2017 1:20 PM
  • Yes, its a physical host. In the eventlog there are only events about the Hyper-V-VmSwitch while I'm running the AzS-Xpr01 role. The VMs should have enough resources.

    While the role is running it uses 8 GB memory.


    • Edited by Lukas Reker Wednesday, July 12, 2017 1:45 PM
    Wednesday, July 12, 2017 1:45 PM
  • We are Investigating your issue and require some logs in order to continue troubleshooting. 

    If you could, please email ascustfeedback@microsoft.comto get a workspace setup to upload your logs.  

     

    Make sure to use a Work, Organizational or Student address when contacting ascustfeedback@microsoft.comand include the thread URL in the subject.

     

    https://aka.ms/GetAzureStackLogs :)

     

    We apologize for any inconvenience and appreciate your time and interest in Azure Stack.

     Thanks,


    Gary Gallanes

    Thursday, July 13, 2017 1:19 AM
  • It seems that I'm not allowed to send emails to this address.

    I put the following URL into the subject: https://social.msdn.microsoft.com/Forums/azure/en-US/ef9484ac-8dcc-4653-8f2e-9f003331a0af/azsxrp01-is-always-shutting-down?forum=AzureStack

    I used my Work address to sent the email.

    -------------------------------------

    Fehler bei der Nachrichtenzustellung an folgende Empfänger oder Gruppen:

    AzStackBeta@microsoft.com Die Nachricht wurde nicht zugestellt, da sie vom E-Mail-Anbieter des Empfängers zurückgewiesen wurde.

    Diagnoseinformationen für Administratoren:

    Generierender Server: MWHPR21MB0479.namprd21.prod.outlook.com

    AzStackBeta@microsoft.com #< #5.7.133 smtp;550 5.7.133 RESOLVER.RST.SenderNotAuthenticatedForGroup; authentication required; Delivery restriction check failed because the sender was not authenticated when sending to this group> #SMTP#



    Thursday, July 13, 2017 12:59 PM
  • Hi Peter,

    still not working :(
    I was able to shut down Azure Stack with your cmdlet over the ERCS vm. After shutting down I started Azure Stack again. The XRP vm is still not working ;( First it's starting and running but it's shutting down after like 5 minutes.

    Friday, July 14, 2017 8:04 AM
  • Hi Lukas,

    We have setup a Workspace for you to upload your logs & sent email with a link and detailed instructions for gathering and uploading your logs.    

     

    Please see the following link for instructions on using The Log Collection Tool.

    https://aka.ms/GetAzureStackLogs

     

    Example:  Get all logs for past 2 weeks

    Get-AzureStackLog -OutputPath C:\AzureStackLogs -FromDate (Get-Date).AddHours(-336) -ToDate (Get-Date)

     

    We look forward to continuing our investigation just as soon as we receive your logs.

     Thanks,


    Gary Gallanes

    Friday, July 14, 2017 4:29 PM
  • Hi Lukas,

    Please submit your logs using the guidance provided by Gary.  Unfortunately given the state you are in, I would recommend a redeploy of your ASDK to get past this issue.  

    Thanks,

    Charlie

    Friday, July 14, 2017 5:26 PM
  • Hi Gary, Hi Charlie,

    I've just finished uploading the log files from azure stack. It would be great if somebody could find out why this problem started on my stack. I also think that redeploying would be the fastest solution. But I won't be able to develop anything if I need to redeploy everytime something goes wrong :/

    Kind Regards/Beste Grüße

    Lukas

    Monday, July 17, 2017 3:46 PM
  • Lukas,

    We are analyzing you logs now and will reply ASAP with 'next steps'.

     Thanks,

    Gary


    Gary Gallanes

    Monday, July 17, 2017 4:58 PM
  • Hello Lukas,

    Quick Update: Investigation is still on going. Currently researching into a workaround/fix.

     Thanks,


    Gary Gallanes

    Thursday, July 20, 2017 5:05 PM
  • Hello Lukas,

    We are seeing some issues with the AzS-XRP01 VM.  Could you please collect the event logs from AzS-XRP01 and upload them to your Workspace?

     

    ### Get all Event logs for AzS-XRP01

    copy \\AzS-XRP01\C$\Windows\System32\winevt\logs\*.evtx  c:\temp\

    Compress-Archive -Path c:\temp\*.evtx  -CompressionLevel Optimal -DestinationPath c:\temp\AzS-XRP.zip -Force

    del c:\temp\*.evtx -Force

     

     Thanks,


    Gary Gallanes

    Tuesday, July 25, 2017 5:43 PM
  • Hi Guys, I had the same issue and it was due to downloading VM Images from Azure Marketplace syndication (not sure why). I had to re-deploy the  Azure Stack from Scratch. 
    Wednesday, July 26, 2017 3:06 AM
  • Thanks Gary for your Email. Just to keep this thread updated:

    I’m sorry but i already redeployed Azure Stack. So the problem is fixed for me at the moment. But the strange thing is that right after successful redeploying Azure Stack the Infrastructure management controller is reporting an unhealty state. I haven’t restarted the Infrastructure Roles like last time as this crashed the whole Azure Stack with the XRP01 VM not working any longer. But how do I get back to an healty state for those Infrastructure Roles? I guess I will open up a new thread in the forum and will close my old thread.


    Wednesday, July 26, 2017 8:40 AM
  • Same issue just experienced here again.

    Clean install of MAS, downloaded several images from the MP

    Azs-xrp01 crashed and now continually shuts itself down after 3-4 mins of run time.

    Unable to delete the pending downloads, redeployment seems like the only resolution.

    Tuesday, September 12, 2017 2:50 PM
  • same for me - once you download from Azure Marketplace the VM started shutting off regularly...
    Thursday, September 14, 2017 10:20 PM
  • I have collected logs as well - sent an email to ascustserv@microsoft.com with the URL and link to my upload on OneDrive :)
    • Edited by Todd Christ Wednesday, September 20, 2017 6:26 PM
    Wednesday, September 20, 2017 5:45 PM
  • Hello Todd,

    There is a known issue with the Compute Controller VM becoming unstable after a Marketplace Syndication fails. It has been fixed in the upcoming release.  The only known mitigation at this point is to redeploy.

      

    Also, while rebooting the XRP VM or associated infra role is not supported for the ASDK, it is possible to initiate a graceful Shutdown, which initiates the appropriate steps, in the appropriate order. Once the ASDK is off, the Startup procedure is also orchestrated, but happens automatically at restart.

     

    The high level Shutdown procedure is as follows:

    1. Connect to the ERCS VM (via Hyper-V, or remote PowerShell session to the PrivilegedEndpoint); Login as the default (generated) account seen when attempting to connect to the VM via Hyper-V.

    2. Execute the Stop-AzureStack command; this will start the Shutdown process.

     

    About Rebooting the XRP VM from the admin portal or admin PS (or Hyper-V/PS) will leave the environment in an unstable state. These commands and remediation actions (you may see them as part of generated alerts) are meant for multi-node environments, and do not apply to the ASDK.

     

    NOTE: More information on connecting to the PrivilegedEndpoint can be found here: https://youtu.be/msoz1wyPj_0

     

    As always, we will keep the community up to date on the status of this issue.

    We apologize for any inconvenience and appreciate your time and interest in Azure Stack.

     

    If you experience any issues or have any question about the Azure Stack Development Kit, please feel free to contact us.

       

    Thanks,


    Gary Gallanes

    Wednesday, September 20, 2017 10:24 PM