locked
Web Apps Australia East - Webjobs reporting "Stopped" when running RRS feed

  • Question

  • This an issue came up 11 months ago.  I thought it was solved but now it has come back again.

    The portals (old & new) report that my Webjob is "Stopped". It isn't, it is still running.

    Kudu console status file contains the json string {"Status":"Stopped"} which is what the console(s) report.

    Now I need to update my Webjob.  I want to stop the running Webjob, load my new script and re-start the Webjob.

    But, I can't stop the running job..!

    How am I supposed to install my update..???


    Simon


    • Edited by Msgwrx Sunday, February 28, 2016 11:39 PM
    Sunday, February 28, 2016 11:38 PM

Answers

  • I know this will sound confusing, but Stop / Start does not affect the WebJobs (only the site), while the Restart button does. Can you give that a try?

    David

    • Marked as answer by Msgwrx Tuesday, March 29, 2016 6:54 AM
    Tuesday, March 29, 2016 5:31 AM

All replies

  • Hi Simon,

    What is your site name? Usually the reason for website being stopped is that it hit quota - are you on Free/Shared SKU?

    Thanks,
    Petr

    Monday, February 29, 2016 12:07 AM
  • Thanks Petr, however the site is running ok and has plenty of capacity. It is not free or shared, I pay a lot of money for it.

    This problem is a repeat of the issue on my post "Running Webjobs appear STOPPED in Portal" (November 5, 2014) about which AmitApple offered a number of helpful suggestions but the problem was never resolved... it just went away after a while, which I find that most disconcerting.

    It is possibly related to the ongoing issues in the Sydney data center, for which there is an "intermittent timeout" outage advisory that has remained unresolved since 2/26/2016 2:11:15 and is one of a crippling series of outages (at least 11 acknowledged since the beginning of February 2016 not to mention at least 5 virtual machine outages in the same period) in the Sydney data center.

    Question is now... what is going to be done about it...???

    Additional information: The situation in Sydney is now so bad access to the outage reports has been removed in the portals. In anticipation of such a possible abrogation of responsibility and in order to gather evidence for the credit that I will inevitably be seeking, I took the precaution of taking a screen shot of the outages a couple days ago... (there are more, this is just as many as I could fit on the screen).


    Well, just to show that I'm happy to give credit where credit is due, the webjob miraculously fixed itself a little while ago with the following appearing several times in the Kudu log for the job. I'm the first to admit I haven't got the faintest idea what it all means, but my webjob is running... hope yours is too.

    [02/29/2016 01:04:11 > 6eb187: SYS INFO] Status changed to Starting
    [02/29/2016 01:04:11 > 6eb187: SYS ERR ] Job failed due to exit code -1
    [02/29/2016 01:04:11 > 6eb187: SYS INFO] Process went down, waiting for 0 seconds
    [02/29/2016 01:04:11 > 6eb187: SYS INFO] Status changed to PendingRestart
    [02/29/2016 01:04:11 > 6eb187: SYS INFO] Run script 'ffff_012_WJOB.ps1' with script host - 'PowerShellScriptHost'
    [02/29/2016 01:04:11 > 6eb187: SYS INFO] Status changed to Running
    [02/29/2016 01:04:11 > 6eb187: SYS WARN] Failed to diff WebJob directories for changes. Continuing to copy WebJob binaries (this will not affect the WebJob run)
    System.IO.DirectoryNotFoundException: Could not find a part of the path 'D:\local\Temp\jobs\continuous\ffff_012_WJOB\2yo4ogig.5nj'.
       at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
       at System.IO.FileSystemEnumerableIterator`1.CommonInit()
       at System.IO.FileSystemEnumerableIterator`1..ctor(String path, String originalUserPath, String searchPattern, SearchOption searchOption, SearchResultHandler`1 resultHandler, Boolean checkHost)
       at System.IO.DirectoryInfo.InternalGetFiles(String searchPattern, SearchOption searchOption)
       at System.IO.DirectoryInfo.GetFiles(String searchPattern, SearchOption searchOption)
       at System.IO.Abstractions.DirectoryInfoWrapper.GetFiles(String searchPattern, SearchOption searchOption)
       at Kudu.Core.Jobs.BaseJobRunner.GetJobDirectoryFileMap(String sourceDirectory)
       at Kudu.Core.Jobs.BaseJobRunner.CacheJobBinaries(JobBase job, IJobLogger logger)
    [02/29/2016 01:04:13 > 6eb187: SYS INFO] Run script 'Fx64_012_WJOB.ps1' with script host - 'PowerShellScriptHost'
    [02/29/2016 01:04:13 > 6eb187: SYS ERR ] Job failed due to exit code -1
    [02/29/2016 01:04:14 > 6eb187: SYS INFO] Process went down, waiting for 60 seconds
    [02/29/2016 01:04:14 > 6eb187: SYS INFO] Status changed to Running
    [02/29/2016 01:04:14 > 6eb187: SYS INFO] Status changed to PendingRestart

    Cheers, Simon



    • Edited by Msgwrx Monday, February 29, 2016 1:46 AM
    Monday, February 29, 2016 1:11 AM
  • Hi Simon,

    I'm really sorry for your experience. If you can post your site name directly or indirectly ( https://github.com/projectkudu/kudu/wiki/Reporting-your-site-name-without-posting-it-publicly ) here, we can try to investigate if there is anything specific for your account. Is the issue still happening?

    As for Sydney, I totally understand the pain related to storage issues. I'm sure you read our blog regarding it https://social.msdn.microsoft.com/Forums/azure/en-US/54fd84f8-95f4-499f-b397-99c86389da59/ongoing-app-service-issues-in-the-australia-east-region?forum=windowsazurewebsitespreview and some possible workarounds which can be applied.

    What you mentioned about the outage reports seems really wrong, I'll try to follow up on that internally.

    Thanks,
    Petr

    Wednesday, March 2, 2016 10:43 PM
  • Sure, thanks Petr

    29north is dummy site in Australia East web app pool and site of interest starts with "p"

    To be honest, I think the window to investigate has probably closed now.  Sydney seems to be back to normal service now and it probably did have something to do with webjob status not being reported correctly.

    There was just too much going on for us to do much about it at the time.

    Cheers, Simon





    • Edited by Msgwrx Saturday, March 5, 2016 8:22 PM
    Saturday, March 5, 2016 8:22 PM
  • Hello Petr

    Ok, the webjobs have gone totally flaky again this afternoon and I can hardly keep track of what they are doing... some are stuck in "stopping..." status including some I've tried to restart (see previous post for app info).

    Honestly, these jobs have been running fime for 11 months and now suddenly they are playing up again.

    It's absolutely essential that I can start/stop and update these jobs reliably at will. They are process real, live data day & night.

    Can someone please sort out what is going on in that Sydney data center..?


    Simon


    • Edited by Msgwrx Monday, March 7, 2016 5:56 AM
    Monday, March 7, 2016 5:55 AM
  • Hi Simon,

    You have two sites that start with a 'p', and they both have WebJobs. Is it the one that has 4 letters or 9 letters in its name?

    thanks,
    David

    Tuesday, March 8, 2016 12:47 AM
  • Hi David

    Oh yes, I do beg your pardon... that would be the 9 letter one!

    As I mentioned to Petr, this may have settled down now Sydney is back to normal service so there may not be much to find now.

    It's still be rather disturbing if you can't seem get control of these jobs anytime you want to.

    The other thing that makes it really hard to figure out what is going on with a webjob is the fact the logging fizzles out after a while like this...

    [03/07/2016 06:16:44 > acedde: WARN] Reached maximum allowed output lines for this run, to see all of the job's logs you can enable website application diagnostics
    [03/07/2016 06:45:10 > acedde: SYS INFO] WebJob is still running
    [03/07/2016 18:45:11 > acedde: SYS INFO] WebJob is still running
    [03/08/2016 06:45:10 > acedde: SYS INFO] WebJob is still running
    [03/08/2016 18:45:10 > acedde: SYS INFO] WebJob is still running
    Doesn't tell you much. A "rolling" log like the http log would be good


    Simon 



    • Edited by Msgwrx Tuesday, March 8, 2016 7:09 PM
    Tuesday, March 8, 2016 7:09 PM
  • Hi Simon,

    Indeed, the max log logic is not ideal. This is being tracked on https://github.com/projectkudu/kudu/issues/1748, and we need to provide an override.

    thanks,
    David

    Wednesday, March 9, 2016 2:35 AM
  • Hi David

    Sorry to bother you but, yet again, web jobs are misbehaving... now I can't stop or start them in either the old or new portal, and they just sit in a status pending state... 

    This is the current situation on all our websites in Australia East.  Some webjobs seem to be continuing to function, while others aren't.

    This is not a good situation for production web sites, the customers rely on data being processed but we don't have any way of managing these web jobs

    To be specific, 29north is the dummy site and the sites of interest begin with "fa", "aa", "pi" and "the" 


    Simon


    • Edited by Msgwrx Tuesday, March 29, 2016 4:25 AM
    Tuesday, March 29, 2016 4:16 AM
  • Hi Simon,

    Is it only the WebJobs that are affected, while the site itself is healthy? Have you tried whether restarting the site helps get it back into shape?

    David

    Tuesday, March 29, 2016 4:41 AM
  • Thanks David

    Yes, only the webjobs, the web sites seem fine.

    Tried stopping then restarting the web site, but no change, they just sit there in status pending.

    Other sites with webjobs in same region appear to be working ok.


    Simon 

    • Edited by Msgwrx Tuesday, March 29, 2016 5:00 AM
    Tuesday, March 29, 2016 5:00 AM
  • I know this will sound confusing, but Stop / Start does not affect the WebJobs (only the site), while the Restart button does. Can you give that a try?

    David

    • Marked as answer by Msgwrx Tuesday, March 29, 2016 6:54 AM
    Tuesday, March 29, 2016 5:31 AM
  • Thanks David... Fixed..!

    Cheers, Simon 

    • Edited by Msgwrx Tuesday, March 29, 2016 6:54 AM
    Tuesday, March 29, 2016 6:53 AM