none
Website returning 504 following yesterdays West US outage

    Question

  • Following yesterdays outage one of our websites running on 2 reserved instances is returning 504 errors for a significant number of requests. The site was rescaled yesterday and subsequently set back to 2 reserved instances in an effort to fix the issue. The site is still however returning 504 errors. No errors are being logged via Elmah and diagnostics are only showing 404 errors for favicon.ico. Could there currently be an issue with Website instances whereby requests are being halted?
    Sunday, November 25, 2012 3:09 PM

All replies

  • I am evaluating the Azure platform as a potential candidate to host a number of low-requirement websites we maintain. I've uploaded a very simple website consisting of a single page C# MVC4 app to the URL and after a few minutes of inactivity, it goes down. If I refresh a few times, the site comes up again usually within 30 seconds and then runs perfectly.

    If I were to guess, this strikes me as an IIS reset / app pool recycle problem but that is just a guess.

    I've been seeing this behavior for three days and have tested across both Chrome and IE on three different networks.

    We are eager to get this issue resolved so that we can move forward with our website migration plans.

    howlongshoulditestmyads (dot) com

    howlongshoulditestmyads (dot) azurewebsites (dot) net

    Here is the Chrome error:

    This webpage is not available

    The connection to howlongshoulditestmyads.com was interrupted.

    Here are some suggestions:

    • Check your Internet connection. Restart any router, modem, or other network devices you may be using.
    • Add Google Chrome as a permitted program in your firewall's or antivirus software's settings. If it is already a permitted program, try deleting it from the list of permitted programs and adding it again.
    • If you use a proxy server, check your proxy settings or contact your network administrator to make sure the proxy server is working. If you don't believe you should be using a proxy server, adjust your proxy settings: Go to the wrench menu >Settings > Show advanced settings... > Change proxy settings... > LAN Settings and deselect the "Use a proxy server for your LAN" checkbox.

    Error 101 (net::ERR_CONNECTION_RESET): The connection was reset.

    Sunday, November 25, 2012 5:16 PM
  • I also experience the same issue with my web site - all started shortly after the outage.

    Since then I get large number of "Connection reset by peer" errors. Retrying the same request eventually gets the requested page.

     


    StefanK


    • Edited by StefanK Monday, November 26, 2012 4:25 PM
    Sunday, November 25, 2012 10:59 PM
  • I see the same thing.  Been testing the waters with a simple ASP.NET MVC website and a Wordpress blog.  I don't remember seeing this when I had them on Azure virtual machines.

    I suspect the same thing as you.  The App pools for the free website hosting must have an idle timeout of 10 min or so. 

    Monday, November 26, 2012 5:06 AM
  • 504 means gateway timed out.

    Are you experiencing any slowness of your site?

    what's your site name?

    Do you remember the exactly time when this issue happened?

    Monday, November 26, 2012 5:32 AM
  • WAWS is a multiple tenants environment. Runtime de-provision your site dynamically if it idle for some time. And when next request comes in, it dynamically provision the site again.

    for the first request, it takes longer than hosting locally.

    Monday, November 26, 2012 5:38 AM
  • For the last 3 days, we also have been getting frequent (but random) disconnects on our website as well (3-5%) of the time even though nothing changed on our end.  It's happening cross browsers and other sites besides my own....so this is definitely a larger issue. 

    Microsoft, please look in to this ASAP!



    • Edited by Agile IT Monday, November 26, 2012 7:25 AM
    Monday, November 26, 2012 7:21 AM
  • The site referred to in the original posting is tetleycatchupapp.azurewebsites.net. The 504 errors effect both the custom hostname and this azurewebsites.net hostname. The 504 errors which effect a high proportion of requests started shortly after the West US outage was resolved, so at least started at 24/11/2012 15:31 UTC and has been occurring through till now 11/26/2012 07:19 UTC and continues with the errors. Pingdom, Fiddler, other remote ping locations have all verified these failed requests.
    • Edited by Clusta Monday, November 26, 2012 7:24 AM Added word "continues"
    Monday, November 26, 2012 7:22 AM
  • This appears to happen every time I close all my browser sessions...I get the error every time I first connect.

    I've provided a NetMon capture of 2 requests.  The first one failed with a connection reset, the second request (using the refresh button) succeeded.

    https://skydrive.live.com/redir?resid=293F86B0EEF92FD8!1579&authkey=!AIRUXpIxcx0LZzQ


    I have a RESERVED (not shared) medium instance. 
    • Edited by Agile IT Monday, November 26, 2012 7:52 AM
    Monday, November 26, 2012 7:51 AM
  • can you please share me the logs like fiddler?
    Monday, November 26, 2012 8:24 AM
  • Sorry, I have a problem download the netmon, can you please double check?
    Monday, November 26, 2012 8:52 AM
  • Fiddler

    GET: http://tetleycatchupapp.azurewebsites.net

    HTTP/1.1 504 Fiddler - Receive Failure

    [Fiddler] ReadResponse() failed: The server did not return a response for this request.         

    Google Chrome

    No data received
    Unable to load the webpage because the server sent no data.
    Here are some suggestions:
    ·         Reload this webpage later.

    Error 324 (net::ERR_EMPTY_RESPONSE): The server closed the connection without sending any data.

    Re-issuing the request more often than not returns the page correctly with a 200 Success. Pingdom global monitoring also verifies that this occurs from other locations.

    Monday, November 26, 2012 10:20 AM
  • If you would like to contact myself directly please use andy dot booth at clusta dot com. We are very keen for any possible assistance.
    Monday, November 26, 2012 10:24 AM
  • In my case the website is www.beyondpod.mobi. The error received when running an http GET request is a TCP error "connection reset by peer". Running a host-tracker.com against www.beyondpod.mobi (and beyondpodwest.azurewebsites.net) returns all errors - you can see the result here: http://host-tracker.com/check_res_ajx/11673047-0/

    I know that Azure WebSites are still in "Preview" but it has been 2 days since those issues started (following the outage). Please let us know what can we try (if any) on our side to resolve this issue as it is generating many user complains.

    I really hope that this can be resolved soon as issues like that do not help the "Rock-solid platform for your blue-sky thinking" image Azure is presenting.

    StefanK


    • Edited by StefanK Monday, November 26, 2012 4:24 PM
    Monday, November 26, 2012 1:45 PM
  • This continues to be a serious and real issue for us. First requests to our West US hosted WebSites are being returned as 504 without a body to the response. Second requests often succeed. The same site deployed to North Europe is not experiencing this issue. We are in the process of migrating the app from West US to North Europe to correct the availability of our site.
    Monday, November 26, 2012 2:17 PM
  • Here is another data point. It seems like the issue it is also dependent on the network you are making requests from. For example any request I tried to www.beyondpod.mobi from the Cincinnati Bell network work correctly on the first try. Any requests that are made over the T-Mobile wireless network fail the first time and (often) succeed the second time. Below are tracert for both in case that helps diagnose the problem:

    Cincinnati Bell - Requests working

    Tracing route to www.beyondpod.mobi [168.62.20.37]

    over a maximum of 30 hops:

      1    <1 ms    <1 ms    <1 ms  unknown [192.168.1.1]
      2     2 ms     1 ms     1 ms  CN1-DSL-208-102-224-1.fuse.net [208.102.224.1]
      3     1 ms     2 ms     2 ms  CIN1.WS-ZT-1.core.fuse.net [216.68.14.53]
      4     1 ms     2 ms     2 ms  WS-ZT-1.CIN1.core.fuse.net [216.68.14.52]
      5    17 ms    17 ms    17 ms  CIN1.ASH1.core.fuse.net [216.68.14.51]
      6    18 ms    18 ms    18 ms  8057.microsoft.com [206.223.115.17]
      7    14 ms    14 ms    16 ms  ge-7-3-0-55.ash-64cb-1b.ntwk.msn.net [207.46.47.93]
      8    75 ms    75 ms    75 ms  xe-1-1-0-0.by2-96c-1b.ntwk.msn.net [207.46.40.74]
      9    74 ms    74 ms    74 ms  xe-2-0-0-0.bay-16c-1a.ntwk.msn.net [207.46.43.52]
     10    74 ms    76 ms    75 ms  10.22.40.159
     11     *        *        *     Request timed out.
     12     *        *        *     Request timed out.

    ....

    T-Mobile  (first request failing, subsequent often working)

    Trace date: 09:24 26/11/2012
    IP Address: 168.62.20.37
    State: UP
    Hostname: www.beyondpod.mobi

    Destination unreachable
    1. 10.170.206.48
    2. 10.170.206.137
    3. 10.168.187.50
    4. 10.168.187.49
    5. 10.170.206.5
    6. 10.170.206.11
    7. 10.161.31.101
    8. 10.177.25.126
    9. 10.176.188.198
    10. 4.53.73.145 te-3-4.car1.Detroit1.Level3.net
    11. 4.69.133.242 ae-8-8.ebr2.Chicago1.Level3.net
    12. 4.69.140.194 ae-5-5.ebr2.Chicago2.Level3.net
    13. 4.69.138.166 ae-2-52.edge4.Chicago3.Level3.net
    14. 4.53.98.10 MICROSOFT-C.edge4.Chicago3.Level3.net
    15. 207.46.40.217 ge-1-0-0-0.ch1-16c-1b.ntwk.msn.net
    16. 204.152.140.58
    17. 207.46.46.168 xe-7-1-1-0.blu-96c-1a.ntwk.msn.net


    StefanK

    Monday, November 26, 2012 2:33 PM
  • WZhao, it is ridiculous - I am able to download and see netmon captured results from given link above!

    We have the same problem with a web site hosted in the West US location as Web Site (in RESERVED mode) as well. It loads normally for a same time and then (after 2-7 minutes) returns "net::ERR_CONNECTION_RESET): The connection was reset" error (fidller shows 502 or 504 error code).

    This pattern repeats all the time after Nov 24th West US data center was restored after an outage of 9 hours (https://www.windowsazure.com/en-us/support/service-dashboard/ see Historical Status)!

    Microsoft, please look at this ASAP.

    Monday, November 26, 2012 2:55 PM
  • http://social.msdn.microsoft.com/Forums/en-US/windowsazurewebsitespreview/thread/8e90cb2f-6ba8-4ba2-a6d4-1ddf26ab9afb

    This is exactly our issue with West US at the moment for Reserved. From Saturday afternoon onwards.

    Monday, November 26, 2012 3:08 PM
  • WZhao, I've double checked the link, I can download it no problem on another PC.

    Can you confirm that you can download the NetMon Capture?

    Monday, November 26, 2012 3:15 PM
  • I believe this is related to the other thread here:
    http://social.msdn.microsoft.com/Forums/en-US/windowsazurewebsitespreview/thread/6956f84f-9828-4873-a990-72a24686db5d

    I have a network trace (NetMon) of the error.

    Microsoft, please confirm you've duplicated the issue and are now investigating the root cause.

    Monday, November 26, 2012 4:10 PM
  • http://social.msdn.microsoft.com/Forums/en-US/windowsazurewebsitespreview/thread/8e90cb2f-6ba8-4ba2-a6d4-1ddf26ab9afb

    This is the same issue we have on our site (it is currently using Shared mode running on 2 instances)  - we got the first reports of this Friday morning.

    Seems to affect USWest only, as we have site on the US East that works correctly.



    StefanK


    • Edited by StefanK Monday, November 26, 2012 4:36 PM
    Monday, November 26, 2012 4:29 PM
  • nothing constructive to add here other than "me too".  the first request after a short (less than 10 minute) period of inactivity gets me an ERR_CONNECTION_RESET on Chrome.

    Monday, November 26, 2012 4:38 PM
  • More telemetry on our West based Azure Website connection reset outage.  On 11/24, Google's Connection TimeOut errors went from 0% previously (only crawl errors due to 404's) to ~9%.

    https://skydrive.live.com/redir?resid=293F86B0EEF92FD8!1580&authkey=!AA9GXayrw2PPH6k
    Monday, November 26, 2012 5:28 PM
  • I'm investigating this now. I will let you know what I find out.


    Jim Cheshire | Microsoft

    Monday, November 26, 2012 9:45 PM
  • I'm going to merge this with the other thread so that I can track it.


    Jim Cheshire | Microsoft

    Monday, November 26, 2012 9:46 PM
  • We believe we have corrected this issue. If you continue to have problems, please post back to this thread with the time you had the issue in GMT and we will investigate further.

    Thanks.


    Jim Cheshire | Microsoft

    Monday, November 26, 2012 10:19 PM
  • Thanks Jim,

    I ran some quick tests and our site appears to be back to normal.

    Thanks for resolving that


    StefanK


    • Edited by StefanK Monday, November 26, 2012 10:40 PM
    Monday, November 26, 2012 10:38 PM
  • Ditto here, all green.
    Monday, November 26, 2012 11:27 PM
  • Yes, I can confirm that the last reported 504 or empty response was at 11/26/2012 22:09:14 GMT, with the site correctly serving requests after that time. Thanks for the resolution.

    It would be great to have a brief summary of the causes of the West US outage on Saturday and West US Azure WebSites connectivity issues through Saturday to Monday from Microsoft. We need to relay this information to our clients.

    Tuesday, November 27, 2012 10:47 AM
  • You can refer to the information available in the Service Dashboard. This contains all the information we have made available regarding the status of Azure services.


    Jim Cheshire | Microsoft

    Tuesday, November 27, 2012 7:25 PM
  • With all do respect, this issue which lasted 3 days did not show up in the service availability during, or now even after the incident.  

    During the incident, ~9% of web requests failed which seems at least degraded IMHO.


    Tuesday, November 27, 2012 8:10 PM
  • Hi, John. If you need further details, please email me at jamesche at Microsoft (please reference this thread) and I will get you in touch with our operations folks who may have more information for you.


    Jim Cheshire | Microsoft

    Tuesday, November 27, 2012 8:27 PM
  • Hi,

    I have similar problem. I am able to access my site via 'carigawe.azurewebsites.net' without problem.

    But, when trying to access it via 'www.carigawe.com', half of the times I got errors.

    The error was

    Error 324 (net::ERR_EMPTY_RESPONSE): The server closed the connection without sending any data.

    Please help.

    Wednesday, January 16, 2013 4:27 AM