none
Azure Functions Downtime on 8/9/16? RRS feed

  • Question

  • Was there downtime today or at least some kind of error with Azure functions? I spent a couple hours trying to diagnose why my functions were failing. Each function was just finishing immediately without running anything, saying "Success", but of course my server was returning an error because they weren't actually running. Then, running the SAME CODE later today it just started working again.

    It's kind of, well, not good! (if it is Azure's issue) I know I can't have 100% guaranteed uptime, but still it would have been nice to know if it was my issue or not, and the status history report for Azure shows nothing. I can't have my server go down for a couple hours like that without knowing what's going on. Anyone else experience this today? (I'm using nodeJs on the functions).

    EDIT: a hint that something was amiss: whenever I pushed updates to the function app to try to fix it, my updates would push to the repo and supposedly deployed successfully, but the message that would normally appear in the functions console saying the code was just updated would not appear.

    Thanks,
    Sam



    Wednesday, August 10, 2016 1:47 AM

All replies

  • I am getting this error message on my screen. Not sure whether it's because of some downtime.

    Error:

    You may be experiencing an error. If you're having issues, please post them here

    Timestamp: 2016-08-10T07:49:48.584Z


    And God said "Let there be Light." ... [Genesis 1:3]

    Wednesday, August 10, 2016 7:51 AM
  • @Samuel: Can you share your function app name (that's the function app name, not the function name), either directly or indirectly? Also, the approximate UTC time where you were seeing issues. This will help us investigate. Thanks!

    @Paul: that sounds unrelated to Samuel's issue, as he was (apparently) not getting error popups. Can you start a separate thread to discuss your case? Make sure you paste the full text (not screenshot) of the error popup, as it includes information that can help us look things up.

    thanks,
    David

    Wednesday, August 10, 2016 5:46 PM
  • I actually did see an error like Paul's. I thought it was just from leaving a browser tab on Azure open for too long but maybe it wasn't. It occurred on the Function App slate.

    As requested, I created a dummy site called skdummysite. The azure function app that had the issue ends in a "t". I'm quite certain 10:00PM UTC was right in the middle of the time when I was having issues.

    I appreciate the help!

    Sam

    Wednesday, August 10, 2016 7:08 PM
  • Hi Sam,

    The only event I see related to that site are 8/10 at 1:25AM and 3:37AM, and seem benign. If you see this again, could you make sure you capture the full text of the error bubble? It helps us pinpoint things in our logs.

    thanks,
    David

    Wednesday, August 10, 2016 7:40 PM
  • Will do, thanks David. I guess I'll never know this time!
    Wednesday, August 10, 2016 9:24 PM
  • Still having issues. Azure functions don't seem to be very stable. I have a function used to process image uploads and it sometimes fails with a 502 error (not an error I'm returning in my code). It's slightly worrisome that I can't rely on it to work. :S This was just happening.
    Thursday, August 11, 2016 2:33 AM
  • Hi!

    I also have a similar issue with some of my functions. For example "functions5b7497d8" function app.

    When I open https://functions5b7497d8.scm.azurewebsites.net  I receive "The service is unavailable." error.\

    Location: East US

    ftp: ftp://waws-prod-blu-049.ftp.azurewebsites.windows.net

    Thursday, August 11, 2016 11:36 AM
  • we're having the same service unavailable problems. @Microsoft, any update?
    Thursday, August 11, 2016 4:49 PM
  • Thanks for reporting. We are investigating and will report back with what we find.
    Thursday, August 11, 2016 5:03 PM
  • We found an issue in East US and South Central US regions which was impacting a subset of Azure Functions customers. We've fixed the issue and things should be running again. Sorry for the inconvenience. We're doing work to improve our monitoring to more quickly detect issues like this in the future (we're still in Preview, so there are a few things still missing in our monitoring).

    Sam and Paul, your issue appears to be different from what Igor and jaredmeade1 were running into. Please let us know if you encounter your issues again and provide the information that David had asked for.

    Thanks!

    Thursday, August 11, 2016 6:17 PM
  • Thanks for the fast response. That issues sounds like something that may have been affecting me yesterday, though I'm not sure.

    As far as my last issue I mentioned, the requests failing ended up being an error on my part. That said, I'm still convinced that sometimes something is going on that I can't see. In some cases, I'll do a git push to a function app and it says deployment successful, but when looking at the logs on subsequent requests I can tell it did not actually update my code because there's no way it could possibly continue to log that way after the change.

    I found today that, under the app settings, clicking "diagnose and solve problems" is really helpful for functions (especially the event logs). I think the main issue right now revolves around what happens when you push code that isn't quite right. The logs that end up in the function console seem random in that they don't coincide with the code you just pushed, and in that it works sometimes and other times it doesn't. It's a mystery to me, though as I said ultimately it was my code causing the issue this time. :)

    Thanks again.

    Thursday, August 11, 2016 6:51 PM
  • Hi Sam,

    If you have a reproducible scenario that leads to bad error reporting, would you be able to open an issue on https://github.com/Azure/azure-webjobs-sdk-script/issues? That would help us understand the scenario and try to improve it.

    thanks!
    David

    Thursday, August 11, 2016 6:55 PM
  • Sure thing, next time I get into that kind of a situation and I'll definitely do that.

    Sam

    Thursday, August 11, 2016 7:03 PM
  • FYI, I just got the error window that says "You may be experiencing an error. If you're having issues, please post them here".

    I have again been experiencing issues that don't make sense. It's not code I can really share, but it was working yesterday and then all of a sudden it doesn't work anymore. It's code that opens a child process to run an executable, but now hangs when running the executable.

    Friday, August 12, 2016 8:42 PM
  • Also, just got this error in the function app UI:

    Number of read requests for subscription 'ID_HERE' exceeded the limit of '15000' for time interval '01:00:00'. Please try again after '5' minutes.


    Friday, August 12, 2016 8:53 PM
  • Hi Samuel,

    When you get that pop up message, there should be a session ID in there, which is how we can correlate the issue. It's meaningless to anyone but us, so you can safely post it publicly.

    thanks,
    David

    Friday, August 12, 2016 8:56 PM
  • Gotchya. I happened to grab the first error's session id:

    5344900cf0e547b1a83d091cb1c4de16

    Friday, August 12, 2016 8:59 PM
  • Thanks you, we will investigate.
    Friday, August 12, 2016 9:10 PM
  • My function is miraculously working again. It's slightly worrisome!
    Friday, August 12, 2016 9:32 PM
  • Based on investigation of logs, we've also found that your Function App is also experiencing issues due to invalid/malformed queue messages. The impact of those is compounded by a bug that causes the host process to actually crash, rather than simply move the bad messages to the poison queue as they should be. That bug was fixed recently and will be deployed in a few days.

    Mathew Charles [MSFT]

    Friday, August 12, 2016 11:07 PM
  • Thanks, Mathew. Yes, I realized I was forming malformed queue messages because of a breaking change in the Azure-Storage module. I did fix that yesterday, but it's possible those malformed messages were still affecting the function. Glad to hear you guys are on top of it. :)
    Saturday, August 13, 2016 12:46 AM
  • Yeah, due to the bug I mentioned, our "move to poison queue" logic won't be triggered, which means that unless you've removed those bad queue messages somehow, they'll keep being picked up. If you've cleaned them up, then you should be good.

    Mathew Charles [MSFT]

    Saturday, August 13, 2016 2:05 AM