locked
WebApps don't seem to be working in Australia East RRS feed

  • Question

  • ...since 13:00 NZ Local Time

    Simon

    • Edited by Msgwrx Friday, July 3, 2015 1:20 AM
    Friday, July 3, 2015 1:19 AM

Answers

  • I am pleased to report that, after 24 hours of trouble free operation, I think I can declare with some confidence The Great Australian Outage was concluded at 10:39 NZT Wednesday Jul 8, 2015 due to some Azure SQL servers being re-started.

    This was some 5 days after the initial event that occurred on Friday 3rd July, 2015 at about 12:50 NZT

    Below are my graphs recording this event;

     


    Simon


    • Edited by Msgwrx Wednesday, July 8, 2015 10:25 PM
    • Marked as answer by Msgwrx Wednesday, July 8, 2015 10:25 PM
    Wednesday, July 8, 2015 10:25 PM

All replies

  • Seeing the same here, all sites plus the Azure Management Portal down.

    (And of course Azure Status reports normal for all regions... very helpful)

    Friday, July 3, 2015 1:24 AM
  • Same here. Our East Australia VPS is down too, though our database in the region is fine.
    Friday, July 3, 2015 1:29 AM
  • Well, my portal was/still is running all right. My web sites briefly have pink triangles beside them... then revert to "tick" running... then back to pink triangles...

    I get this really useful message on the site;

    502 - Web server received an invalid response while acting as a gateway or proxy server.

    There is a problem with the page you are looking for, and it cannot be displayed. When the Web server (while acting as a gateway or proxy) contacted the upstream content server, it received an invalid response from the content server.

    I don't have a clue what it means and, like my customers I'm not really interested, I'm an applications guy and I just want running web sites (if its not too much to ask).


    Simon


    • Edited by Msgwrx Friday, July 3, 2015 1:32 AM
    Friday, July 3, 2015 1:30 AM
  • We are having the same problem, I can not even create a support ticket although I'm paying for standard support.

    I called Microsoft AU on 132058, they do not have any idea about Azure, they gave me the Sales number for Azure and no one is picking up. This is absolutely ridiculous... we have over 14 Web Apps down with no support. 

    Friday, July 3, 2015 1:41 AM
  • same, all my sites are down - they eventually return a 503 


    Friday, July 3, 2015 1:44 AM
  • Azure SQL databases seem ok, but can't access VMs from Auckland

    Oooh... they appear to have come back from lunch now

    Network Infrastructure - Australia East - Advisory
    • 1 min ago

      An alert for Network Infrastructure in Australia East is being investigated. More information will be provided as it is known.


    Simon

    • Edited by Msgwrx Friday, July 3, 2015 1:45 AM
    Friday, July 3, 2015 1:44 AM
  • 13:52 NZT

    Have run up a quick duplicate site in Australia South East, seems to be working ok...


    Simon

    • Edited by Msgwrx Friday, July 3, 2015 1:52 AM
    Friday, July 3, 2015 1:52 AM
  • 45 mins total outage of all our VMs in Sydney before they even acknowledge there is a problem on the Azure status.

    Luckily, I still had 1 more service to migrate, production hosting.

    Turns ship around and waves goodbye to Azure. 

    DISGRACEFUL Microsoft.

    Friday, July 3, 2015 1:53 AM
  • The Advisory is a warning as well...........it should be an error. 

    And please remove the tick "http://azure.microsoft.com/en-us/status/"


    Micatio Software Free IIS Azure Web Log App

    Friday, July 3, 2015 1:59 AM
  • Belated update:

    Network Infrastructure - Australia East - Partial Service Interruption

    12 mins ago Starting at approximately 01:03 on 3 July, 2015 customers with Azure services hosted in Australia East may experience inability to connect to their service resources. Our engineers are engaged and actively investigating this issue. The next update will be provided in 60 minutes or as events warrant.


    Simon

    • Edited by Msgwrx Friday, July 3, 2015 2:06 AM
    Friday, July 3, 2015 2:06 AM
  • Maybe they thought you wouldn't notice.

    Not to worry, they say they'll provide an update... "in 60 minutes or as events warrant"

    It's only "infrastructure" afterall, it's not like anything really important is broken... like promises


    Simon




    • Edited by Msgwrx Friday, July 3, 2015 2:18 AM
    Friday, July 3, 2015 2:10 AM
  • Error message has gone away, now just 'This service is unavailable'.

    Friday, July 3, 2015 2:19 AM
  • It's actually 2 hours now, I noticed straight away because I was busy doing something at the time.

    24 * 365 = 8,760 hours per year, 99.95% availability = 8,755.62 hours per year

    Therefore, Azure is allowed to be "down" 4.38 hours per year...

    and 2 of those hours have just passed us by forever..!

    Here's your problem right here... someone in Seattle had to come back to work to fix Australia

    Network Infrastructure - Australia East - Service Interruption
    11 mins ago

    7/2/2015 19:54:03 - Starting at approximately 01:03 UTC on 3 July, 2015 customers with Azure services hosted in Australia East may experience inability to connect to their service resources. Engineers have identified preliminary root cause and are preparing mitigation steps. The next update will be provided in 60 minutes or as events warrant


    Simon



    • Edited by Msgwrx Friday, July 3, 2015 3:12 AM
    Friday, July 3, 2015 3:04 AM
  • It's killing me!
    Friday, July 3, 2015 3:26 AM
  • My client has 100's of people waiting to get event changes made. Apparently things are getting nasty. Thankfully they didn't listed to me about using Azure for the main task, otherwise it would be thousands.

    Very stressed right now.

    Friday, July 3, 2015 3:29 AM
  • Hang in there, mate!

    Any chance of running up something in Melbourne..?

    I did and I'm managing to limp along at the moment


    Simon

    • Edited by Msgwrx Friday, July 3, 2015 3:34 AM
    Friday, July 3, 2015 3:34 AM
  • It's getting to the point where I'm thinking about spinning up in Melbourne instead of waiting
    Friday, July 3, 2015 3:34 AM
  • Melbourne does seem attractive... RED dots all over my portal, 3.5 hours and counting!

    Network Infrastructure - Australia East - Service Interruption

    8 mins ago

    Starting at approximately 01:03 UTC on 3 July, 2015 customers with Azure services hosted in Australia East may experience inability to connect to their service resources. Engineers continue to deploy mitigation steps and we are making progress towards service recovery. The next update will be provided in 60 minutes or as events warrant.


    Simon

    • Edited by Msgwrx Friday, July 3, 2015 4:28 AM
    Friday, July 3, 2015 4:28 AM
  • Yeah Melbourne sounds like an option however bit hard when the database is stuck in Sydney.  

    Probably restructure everything after this episode is over. I took the initial assumption that a Microsoft data center would'nt be offline for 3 hours. 

    Its probably cheaper to have something sitting in the US given there is a 26% price increase next month even AWS is looking promising be it would need to remove some of the Azure framework stuff from the C# code.

     



    Micatio Software Free IIS Azure Web Log App

    Friday, July 3, 2015 4:45 AM
  • One of our websites just came back online...
    Friday, July 3, 2015 4:47 AM
  • Still down. Looks like core networking and/or DNS is affecting a large customer base in Australia East. We have 4 servers accessible very sporadically and interconnection between servers in the same virtual network is totally broken.

    I would agree - the support messages are far from useful and should show how long the service has been down. This should not be a warning it is definitely affecting services and is an error.

    Friday, July 3, 2015 4:51 AM
  • Yes, affirmative... a bit flaky thtough, 2 of 11 have started up for me

    Thats 13:50 to 16:50 NZT

    CORRECTION

    12:50 TO 16:50...!


    Simon




    • Edited by Msgwrx Friday, July 3, 2015 5:20 AM
    Friday, July 3, 2015 4:53 AM
  • Finally, after nearly four hours, we're back. Now comes the clean-up and general client pacification.

    Then on Monday, the search begins.

    Friday, July 3, 2015 4:53 AM
  • All my sites back up, finally!

    Has anyone tried a solution like this?  

    http://blogs.msdn.com/b/waws/archive/2015/06/01/create-an-azure-web-app-failover-solution-on-a-budget.aspx

    quote:

    "Web App service interruptions are rare, but they do happen, and for those impacted, having this secondary instance on stand-by can solve the issue for you really fast" .....

    Friday, July 3, 2015 4:59 AM
  • Needed to restart our web servers again otherwise you may get error about the "The site's home directory could not be located".

    SQL Azure is still stuffed.


    Micatio Software Free IIS Azure Web Log App

    Friday, July 3, 2015 5:03 AM
  • I'll have a close look at the failover idea. At first glance I can't see how the database side of things is handled, but that's probably my lack of knowledge.

    Friday, July 3, 2015 5:18 AM
  • Failure is only for Web Sites.  Its basically like a round robin DNS.

    You would need to geo-replicate the database servers otherwise you'll be stuffed. I'll be looking at this next week.



    Micatio Software Free IIS Azure Web Log App

    Friday, July 3, 2015 5:25 AM
  • That's the bit I don't really understand - in cases like today where it was a networking infrastructure issue, how does the system know to start using a replica database.

    Some research required obviously.

    Friday, July 3, 2015 5:39 AM
  • Still working on it folks. Stay tuned to http://azure.microsoft.com/en-us/status/ for some live feedback. There is also another thread you can watch. We'll try to reply once we've seen it's fixed. https://social.msdn.microsoft.com/Forums/en-US/8193e4f8-d2a6-4594-acb8-10eaa8c2cf37/azure-web-down-in-australia-eastalso-database-stuck-with-suspend-status?forum=windowsazurewebsitespreview
    Friday, July 3, 2015 6:33 AM
  • Thanks for the update Chris. A couple of thoughts about why episodes like this are so frustrating:

    1. It took nearly an hour for the status page to be updated.

    2. There's not even a guess as to the likely fix time.

    3. There's no mechanism for reporting an error directly. Clicking the link to submit a trouble ticket just takes you to a page that offers the consumer support numbers or the status page.

    4. Although the consumer support guy did talk to me, his first response was that we're busy can we call you back. No.

    5. When you've got a client that's face-to-face with 100's (not kidding) of customers, some pretty testy, it would really help to give them as much detail as possible. You probably can't tell us exactly what went wrong, but something helps. 'Network Infrastructure' is just too vague.

    6. I read your point about live updates, but I think that 'live' could better than hourly.

    It's a shame that this update happened on the worst possible day for my client. I'll be heading over there soon with an expensive bottle of something, but I suspect they've got a very long night ahead entering all the changes they recorded on paper.

    - Simon

    Friday, July 3, 2015 6:49 AM
  • Thanks for the update Chris. A couple of thoughts about why episodes like this are so frustrating:

    1. It took nearly an hour for the status page to be updated.

    2. There's not even a guess as to the likely fix time.

    3. There's no mechanism for reporting an error directly. Clicking the link to submit a trouble ticket just takes you to a page that offers the consumer support numbers or the status page.

    4. Although the consumer support guy did talk to me, his first response was that we're busy can we call you back. No.

    5. When you've got a client that's face-to-face with 100's (not kidding) of customers, some pretty testy, it would really help to give them as much detail as possible. You probably can't tell us exactly what went wrong, but something helps. 'Network Infrastructure' is just too vague.

    6. I read your point about live updates, but I think that 'live' could better than hourly.

    It's a shame that this update happened on the worst possible day for my client. I'll be heading over there soon with an expensive bottle of something, but I suspect they've got a very long night ahead entering all the changes they recorded on paper.

    - Simon

    Agree with you there Simon. 

    As far as I'm concerned the data center was down.  

    Looks nice just to show one warning on the status page when in factor nothing much was working.  e.g. you could connect to Azure SQL but that was it. 

    Lots of customers have lost revenue today.....yes in hindsight you should replicate to other regions however there is a cost associated with this.

    We shouldn't have a single point of failure period. 


    Micatio Software Free IIS Azure Web Log App

    Friday, July 3, 2015 7:09 AM
  • Thanks Ron.

    I've obviously been misunderstanding the Azure PR. I thought the point of cloud services was that we didn't need to geo-replicate and that in the case of problems, other resources picked up the load. In my mind, geo-replication was for performance issues when dealing with a global consumer base.

    In having paid $150-$200 a month for this service for the last two years, mostly for the comfort of the cloud, it turns out I'd actually have been just as well off on shared hosting at thirty bucks a month.

    Friday, July 3, 2015 7:24 AM
  • I'm with Simon. I thought I was paying a premium for peace of mind.

    I liked how, instead of putting a red mark next to Sydney on the world map, they just removed Sydney and maintained a world map covered in green ticks.

    I'd post the image, but I can't verify my account (and therefore post images) because my email is inaccessible for some reason...

    [edit] Status page now says ""All services are working properly" - can anyone confirm? My VMs don't work.

    [update] All my VM's still say "? Retrieving Status", and all my services are still down, but I've noticed my billing graph has started to climb again, after being flat all day. They restored network connection to the billing systems, at least, so that's a comfort.


    • Edited by Darryn J Friday, July 3, 2015 7:47 AM
    Friday, July 3, 2015 7:30 AM
  • Darryn, I'm on a Standard sites plan, which I understand is like a managed VM and that seems to be fine now.

    Friday, July 3, 2015 7:47 AM
  • My Azure portal still shows Virtual Machines, SQLDatabases, Storage, Media Services, and Recovery Services (that's ominous) as in a bad state with the pink exclamation mark... rather more bizarrely, things do actually seem to be working, just couldn't manage them if I had to.

    Simon

    • Edited by Msgwrx Friday, July 3, 2015 7:51 AM
    Friday, July 3, 2015 7:51 AM
  • It seems like everything is working again, although I had to hard reboot my servers. Websites are up, I'm getting emails, I can SSH in.

    Now I'm able to send all my hosting clients the email telling them they get this month refunded. Yay!

    Friday, July 3, 2015 8:21 AM
  • Are any of you on an Enterprise agreement for Azure?

    I find it hard to believe there wasn't more people affected by this unless Microsoft have two tier levels in the Data Center...Enterprise Customers and Prepaid hourly Customers.


    Micatio Software Free IIS Azure Web Log App

    Friday, July 3, 2015 8:52 AM
  • Not me, I just pay my couple of hundred a month, no special agreement that I'm aware of.

    It is surprising that more people weren't here airing their thoughts.

    Friday, July 3, 2015 9:46 AM
  • Indeed, who knows..? Here we are 22 hours later and the WebApp to Azure SQLDatabase connection is running like a dog... It works, but there have been query timeouts all night causing my apps to fall-over left, right and center (yes, yes, yes... they retry again & again & again but eventually give up).

    To me, it has the feel of some sort of re-build going on in the background, like when you insert a new disk in a RAID array and wait hours and hours for it to fix itself, or maybe a "mitigated" 10Mb pipe between services.

    Interestingly, my understanding is that the Australian data centers are the first non-Microsoft owned centers, which might explain why there hasn't been a peep out of Microsoft about this problem, although that might change if the press get hold of it. Apparently these centers were shipped in containers complete and ready to plug-in. I expect the contractor is getting a wee talking to from Microsoft this morning, given they now hold the world record for an Azure outage.

    Is it unreasonable to ask that someone from Microsoft, with the relevant knowledge and authority, post an explanation for this outage here to put all our minds at ease that this won't happen again..?


    Simon


    • Edited by Msgwrx Friday, July 3, 2015 11:22 PM
    Friday, July 3, 2015 11:22 PM
  • Interesting and unfortunate world record to hold.

    And no, I don't think it's unreasonable to expect an explanation. And I think we need some kind of details about their strategy to prevent it happening again.

    Thanks also for the tip that these centers might not be Microsoft owned. I've been swept along reading about Azure from Hanselman and the like, but now I find I might not be dealing with who I thought, for what I wanted, or for the price I agreed to.

    Saturday, July 4, 2015 6:06 AM
  • Hey Guys,

    Happy Monday morning. 

    Yeah I reckon it was an SAN issue of some sorts (don't forget that Microsoft apparently writes any IO three times).
    Microsoft needs to create an incident report and share this with the community.

    My SQL azure instance was online for the full outage and was only restarted at the very end.  However any SQL statements that needed IO would get stuck during the outage. 

    Also the outage partially affected some customers not all however I haven't confirmed this yet. 


    Micatio Software Free IIS Azure Web Log App

    Sunday, July 5, 2015 10:53 PM
  • Thanks Ron. Yes, I would concur that the Azure SQL databases did remain available on Friday because I accessed mine through the management portal during "The Great Australian Outage" and setup a quick WebApp in Melbourne to access and process waiting records.

    The main problem, and to a lesser (although irritating enough) extent this still persists this morning, was that Sydney WebApps couldn't get consistent connections to Sydney Azure SQL databases.

    There has been has a pretty low key response to this... did anyone actually notice..?


    Simon


    • Edited by Msgwrx Sunday, July 5, 2015 11:28 PM
    Sunday, July 5, 2015 11:28 PM
  • Simon, pretty lucky then.

    Both the WebApps, Database & Virtuals were stuffed.

    Even though the database was online you couldn't access any tables, etc.  Must have been lucky to have all my Azure objects on the same SAN unit :-(

    Got some minor timeout's (MS Portal) this morning  but that could be normal.

    As with the Azure SLA you need stuff replicated to Melbourne so I'm busy fixing this up now (actually West US).  

    Don't forget that Azure costs increase next month by 26%.


    Micatio Software Free IIS Azure Web Log App

    Monday, July 6, 2015 12:24 AM
  • The Australian Azure data center continues to give us grief by providing very poor performance to us here in New Zealand over 3 days after the event. Our WebApps are close to unusable due to some problem with Azure SQL connectivity within the data center.

    Please see my other post "Australia East Azure SQL Connections fail" today.

    Would be great if someone could investigate this, it all worked swimmingly before Friday for more than 7 months.

    Cheers


    Simon


    • Edited by Msgwrx Monday, July 6, 2015 4:37 AM
    Monday, July 6, 2015 4:36 AM
  • Well, look what turned up in my portal this morning

    SQL Database - Australia East - Advisory [Australia East]

    Starting at 07 July, 2015 13:55 UTC a subset of customers using SQL Database in Australia East may experience intermittent login failures or timeouts. A retry may allow a successful login. We are currently evaluating options to restore service. The next update will be provided in 60 minutes.

    There has to be ongoing problems in Sydney since Friday, I ran up duplicate apps/databases in Melbourne and they worked absolutely fine.


    Not enjoying being a "subset" Simon



    • Edited by Msgwrx Tuesday, July 7, 2015 8:39 PM
    Tuesday, July 7, 2015 8:15 PM
  • Well, look what turned up in my portal this morning

    SQL Database - Australia East - Advisory [Australia East]

    Starting at 07 July, 2015 13:55 UTC a subset of customers using SQL Database in Australia East may experience intermittent login failures or timeouts. A retry may allow a successful login. We are currently evaluating options to restore service. The next update will be provided in 60 minutes.

    There has to be ongoing problems in Sydney since Friday, I ran up duplicate apps/databases in Melbourne and they worked absolutely fine.


    Not enjoying being a "subset" Simon




    Update (of sorts)

    SQL Database - Australia East - Advisory [Australia East]

    Starting at approximately 07 July, 2015 13:55 UTC a subset of customers using SQL Database in Australia East may experience intermittent login failures or timeouts. A retry may allow a successful login. We have identified a potential root cause, and are continuing work to restore service. The next update will be provided in 2 hours or as events warrant.


    Simon



    • Edited by Msgwrx Tuesday, July 7, 2015 10:04 PM
    Tuesday, July 7, 2015 10:03 PM
  • Hello,

    The current status shows that our investigation of the alert is complete and we have determined the service is healthy. If you continue to have any issues, I request you to create a Support Ticket with us to look into this further.

    Thanks,
    Syed Irfan Hussain

    Wednesday, July 8, 2015 8:15 AM
  • Thanks, but I won't be using my support credits raising a support ticket for something that wasn't my fault.

    Simon

    • Edited by Msgwrx Wednesday, July 8, 2015 10:24 PM
    Wednesday, July 8, 2015 10:24 PM
  • I am pleased to report that, after 24 hours of trouble free operation, I think I can declare with some confidence The Great Australian Outage was concluded at 10:39 NZT Wednesday Jul 8, 2015 due to some Azure SQL servers being re-started.

    This was some 5 days after the initial event that occurred on Friday 3rd July, 2015 at about 12:50 NZT

    Below are my graphs recording this event;

     


    Simon


    • Edited by Msgwrx Wednesday, July 8, 2015 10:25 PM
    • Marked as answer by Msgwrx Wednesday, July 8, 2015 10:25 PM
    Wednesday, July 8, 2015 10:25 PM