Following yesterdays outage one of our websites running on 2 reserved instances is returning 504 errors for a significant number of requests. The site was rescaled yesterday and subsequently set back to 2 reserved instances in an effort to fix the issue. The site is still however returning 504 errors. No errors are being logged via Elmah and diagnostics are only showing 404 errors for favicon.ico. Could there currently be an issue with Website instances whereby requests are being halted?
I am evaluating the Azure platform as a potential candidate to host a number of low-requirement websites we maintain. I've uploaded a very simple website consisting of a single page C# MVC4 app to the URL and after a few minutes of inactivity, it goes down. If I refresh a few times, the site comes up again usually within 30 seconds and then runs perfectly.
If I were to guess, this strikes me as an IIS reset / app pool recycle problem but that is just a guess.
I've been seeing this behavior for three days and have tested across both Chrome and IE on three different networks.
We are eager to get this issue resolved so that we can move forward with our website migration plans.
howlongshoulditestmyads (dot) com
howlongshoulditestmyads (dot) azurewebsites (dot) net
Here is the Chrome error:This webpage is not available
The connection to howlongshoulditestmyads.com was interrupted.
Here are some suggestions:
- Check your Internet connection. Restart any router, modem, or other network devices you may be using.
- Add Google Chrome as a permitted program in your firewall's or antivirus software's settings. If it is already a permitted program, try deleting it from the list of permitted programs and adding it again.
- If you use a proxy server, check your proxy settings or contact your network administrator to make sure the proxy server is working. If you don't believe you should be using a proxy server, adjust your proxy settings: Go to the wrench menu >Settings > Show advanced settings... > Change proxy settings... > LAN Settings and deselect the "Use a proxy server for your LAN" checkbox.
Error 101 (net::ERR_CONNECTION_RESET): The connection was reset.
I also experience the same issue with my web site - all started shortly after the outage.
Since then I get large number of "Connection reset by peer" errors. Retrying the same request eventually gets the requested page.
- Edited by StefanK Monday, November 26, 2012 4:25 PM
I see the same thing. Been testing the waters with a simple ASP.NET MVC website and a Wordpress blog. I don't remember seeing this when I had them on Azure virtual machines.
I suspect the same thing as you. The App pools for the free website hosting must have an idle timeout of 10 min or so.
WAWS is a multiple tenants environment. Runtime de-provision your site dynamically if it idle for some time. And when next request comes in, it dynamically provision the site again.
for the first request, it takes longer than hosting locally.
For the last 3 days, we also have been getting frequent (but random) disconnects on our website as well (3-5%) of the time even though nothing changed on our end. It's happening cross browsers and other sites besides my own....so this is definitely a larger issue.
Microsoft, please look in to this ASAP!
- Edited by Agile IT Monday, November 26, 2012 7:25 AM
The site referred to in the original posting is tetleycatchupapp.azurewebsites.net. The 504 errors effect both the custom hostname and this azurewebsites.net hostname. The 504 errors which effect a high proportion of requests started shortly after the West US outage was resolved, so at least started at 24/11/2012 15:31 UTC and has been occurring through till now 11/26/2012 07:19 UTC and continues with the errors. Pingdom, Fiddler, other remote ping locations have all verified these failed requests.
- Edited by Clusta Monday, November 26, 2012 7:24 AM Added word "continues"
This appears to happen every time I close all my browser sessions...I get the error every time I first connect.
I've provided a NetMon capture of 2 requests. The first one failed with a connection reset, the second request (using the refresh button) succeeded.
I have a RESERVED (not shared) medium instance.
- Edited by Agile IT Monday, November 26, 2012 7:52 AM
HTTP/1.1 504 Fiddler - Receive Failure
[Fiddler] ReadResponse() failed: The server did not return a response for this request.
Google ChromeNo data receivedUnable to load the webpage because the server sent no data.Here are some suggestions:· Reload this webpage later.
Error 324 (net::ERR_EMPTY_RESPONSE): The server closed the connection without sending any data.
Re-issuing the request more often than not returns the page correctly with a 200 Success. Pingdom global monitoring also verifies that this occurs from other locations.
In my case the website is www.beyondpod.mobi. The error received when running an http GET request is a TCP error "connection reset by peer". Running a host-tracker.com against www.beyondpod.mobi (and beyondpodwest.azurewebsites.net) returns all errors - you can see the result here: http://host-tracker.com/check_res_ajx/11673047-0/
I know that Azure WebSites are still in "Preview" but it has been 2 days since those issues started (following the outage). Please let us know what can we try (if any) on our side to resolve this issue as it is generating many user complains.
I really hope that this can be resolved soon as issues like that do not help the "Rock-solid platform for your blue-sky thinking" image Azure is presenting.
- Edited by StefanK Monday, November 26, 2012 4:24 PM
This continues to be a serious and real issue for us. First requests to our West US hosted WebSites are being returned as 504 without a body to the response. Second requests often succeed. The same site deployed to North Europe is not experiencing this issue. We are in the process of migrating the app from West US to North Europe to correct the availability of our site.
Here is another data point. It seems like the issue it is also dependent on the network you are making requests from. For example any request I tried to www.beyondpod.mobi from the Cincinnati Bell network work correctly on the first try. Any requests that are made over the T-Mobile wireless network fail the first time and (often) succeed the second time. Below are tracert for both in case that helps diagnose the problem:
Cincinnati Bell - Requests working
Tracing route to www.beyondpod.mobi [126.96.36.199]over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms unknown [192.168.1.1]
2 2 ms 1 ms 1 ms CN1-DSL-208-102-224-1.fuse.net [188.8.131.52]
3 1 ms 2 ms 2 ms CIN1.WS-ZT-1.core.fuse.net [184.108.40.206]
4 1 ms 2 ms 2 ms WS-ZT-1.CIN1.core.fuse.net [220.127.116.11]
5 17 ms 17 ms 17 ms CIN1.ASH1.core.fuse.net [18.104.22.168]
6 18 ms 18 ms 18 ms 8057.microsoft.com [22.214.171.124]
7 14 ms 14 ms 16 ms ge-7-3-0-55.ash-64cb-1b.ntwk.msn.net [126.96.36.199]
8 75 ms 75 ms 75 ms xe-1-1-0-0.by2-96c-1b.ntwk.msn.net [188.8.131.52]
9 74 ms 74 ms 74 ms xe-2-0-0-0.bay-16c-1a.ntwk.msn.net [184.108.40.206]
10 74 ms 76 ms 75 ms 10.22.40.159
11 * * * Request timed out.
12 * * * Request timed out.
T-Mobile (first request failing, subsequent often working)
Trace date: 09:24 26/11/2012
IP Address: 220.127.116.11
10. 18.104.22.168 te-3-4.car1.Detroit1.Level3.
11. 22.214.171.124 ae-8-8.ebr2.Chicago1.Level3.
12. 126.96.36.199 ae-5-5.ebr2.Chicago2.Level3.
13. 188.8.131.52 ae-2-52.edge4.Chicago3.Level3.
14. 184.108.40.206 MICROSOFT-C.edge4.Chicago3.
15. 220.127.116.11 ge-1-0-0-0.ch1-16c-1b.ntwk.
17. 18.104.22.168 xe-7-1-1-0.blu-96c-1a.ntwk.
I believe this is related to the other thread here:
I have a network trace (NetMon) of the error.
Microsoft, please confirm you've duplicated the issue and are now investigating the root cause.
This is the same issue we have on our site (it is currently using Shared mode running on 2 instances) - we got the first reports of this Friday morning.
Seems to affect USWest only, as we have site on the US East that works correctly.
- Edited by StefanK Monday, November 26, 2012 4:36 PM
More telemetry on our West based Azure Website connection reset outage. On 11/24, Google's Connection TimeOut errors went from 0% previously (only crawl errors due to 404's) to ~9%.https://skydrive.live.com/redir?resid=293F86B0EEF92FD8!1580&authkey=!AA9GXayrw2PPH6k
We believe we have corrected this issue. If you continue to have problems, please post back to this thread with the time you had the issue in GMT and we will investigate further.
Jim Cheshire | Microsoft
- Proposed as answer by Jim CheshireMicrosoft employee, Moderator Monday, November 26, 2012 10:20 PM
Yes, I can confirm that the last reported 504 or empty response was at 11/26/2012 22:09:14 GMT, with the site correctly serving requests after that time. Thanks for the resolution.
It would be great to have a brief summary of the causes of the West US outage on Saturday and West US Azure WebSites connectivity issues through Saturday to Monday from Microsoft. We need to relay this information to our clients.
With all do respect, this issue which lasted 3 days did not show up in the service availability during, or now even after the incident.
During the incident, ~9% of web requests failed which seems at least degraded IMHO.
- Edited by Agile IT - John Tuesday, November 27, 2012 8:12 PM grammar
I have similar problem. I am able to access my site via 'carigawe.azurewebsites.net' without problem.
But, when trying to access it via 'www.carigawe.com', half of the times I got errors.
The error was
Error 324 (net::ERR_EMPTY_RESPONSE): The server closed the connection without sending any data.