locked
Web role occasionally dies RRS feed

  • Question

  • We have a SaaS running on Azure WebRoles. Occasionally, maybe twice a year all our instances will just die for no good reason, and go into a Busy/Retry loop. See attached screenshot.

    Is there any good reason for this and how can I diagnose? It seems fairly random and redeploying fixes it.

    Rebooting or restarting the instances doesn't fix it.

    Craig


    • Edited by YoureOnTime Tuesday, April 8, 2014 10:28 AM
    Tuesday, April 8, 2014 10:26 AM

Answers

  • Hi,

    May I know whether you have alerted the system in any way, such as modifying registry settings, writing to local disk, etc? Based on my understanding, you could try to reimage all instances (there's a reimage button on the instances tab on the portal). This will refresh all instance servers to their original state (remember to save any data you need in an external storage). If that still doesn't help, please check if an external dependency may be broken. For instance, imagine you read and validate some data from a database during role startup, and you modified the database and altered those critical data. The next time the role starts, your code checks the data and it fails to pass validation.

    Best Regards,

    Ming Xu


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Wednesday, April 16, 2014 8:33 AM

All replies

  • hi,

    I suggest you need check your log and start up task . You could login into azure instance and find the log files. I recommend you refer to this troubleshooting documents (http://blogs.msdn.com/b/kwill/archive/2013/09/06/troubleshooting-scenario-3-role-stuck-in-busy.aspx and http://blogs.msdn.com/b/kwill/archive/2013/10/03/troubleshooting-scenario-7-role-recycling.aspx ). Hope it helps.

    Regards,

    Will


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.


    Wednesday, April 9, 2014 2:16 AM
  • I have been through the role stuck in busy issues in the past, but they are normally when a role never starts to begin with, indicating a problem in the code. With this the role works fine for weeks and then just suddenly dies. Even stopping and starting it doesn't fix, which makes it sound like a hardware issue.
    Wednesday, April 9, 2014 3:49 AM
  • Hi,

    May I know whether you have alerted the system in any way, such as modifying registry settings, writing to local disk, etc? Based on my understanding, you could try to reimage all instances (there's a reimage button on the instances tab on the portal). This will refresh all instance servers to their original state (remember to save any data you need in an external storage). If that still doesn't help, please check if an external dependency may be broken. For instance, imagine you read and validate some data from a database during role startup, and you modified the database and altered those critical data. The next time the role starts, your code checks the data and it fails to pass validation.

    Best Regards,

    Ming Xu


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Wednesday, April 16, 2014 8:33 AM