none
Delay in Scheduled Jobs Caused Them to Get Backed Up and Fail RRS feed

  • Question

  • I have an automation job that runs every 15 minutes to stop the running VM and start the next VM in a resource group.

    I noticed the scheduled jobs started to get delayed on 3/19 10:30PM UTC. This job took nearly 45 minutes to complete (usually takes 2-3 minutes) which in turn delayed future jobs. It looks like the jobs finally caught up on 3/20 1:00AM UTC, but by then the jobs were no longer working due to the previous delays. All VMs are now stopped and display the errors "Provisioning failed. Unknown network allocation error.. NetworkingInternalOperationError"

    What may have happened is that there was a failure in the VM causing it to be "stopped" rather than "deallocated" and my runbook was only checking for the "deallocated" status which is why all of the VMs were off when I checked this morning. I updated my runbook so that it also checks for the "stopped" status which seemed to make the job work as expected. But I would still like to get to the bottom of exactly what happened.
    Wednesday, March 20, 2019 3:44 PM

All replies

  • Hi Jjohnsondev, sorry for the service disruption and I do agree that updating the script will prevent future occurrence of this issue. To determine rootcause, we will need to engage the Azure technical support team using these steps.

    If you do not have a support plan, please send mail to AzCommunity@microsoft.com, include your subscription ID and a link to this MSDN thread (for context) and we will assist you with engaging Azure support.

    Cheers,

    Friday, March 22, 2019 3:01 AM
    Moderator