Asked by:
Timeout for some of websites after they stopped for any reason while other worked just fine (iisreset was only solution)

Question
-
User-50014173 posted
Hello,
I have very weird situation happening from time to time. Scenario: IIS 8.5 with about 1.5k websites. App pool after 20-30 minutes is being terminated. They all work under 32bit. From time to time (once in a month maybe?) I'm experiencing a situation:
- clients are trying to access their websites are getting permanent loading experience (timeout)
- not all websites are being affected because some of them are working just fine
- solution for this problem is performing iisreset
- restarting website or app pool is not helping
Observations:
- websites which stopped responding / are permanently loading stopped responding in different time (IIS logs shows very different times when they just stopped responding -> no new entries in log)
- it looks like they stopped responding in moment when they has been shutdown by WAS because of inactivity and they never started again until iisreset or they have been stopped by website application error (one client had DB limit exceeded which caused his app to crash) -> so basically it doesn't matter what caused website to stop but after that app couldn't start again
As far as I can see there are no errors in Event Log System or Application except entries mentioned above. There is a big chance that most other websites where working just fine because they where not restarted because of being actively used or not crashing because of application errors. IIS logs just have gaps between moment when application stopped and iisreset command.
Does someone experienced similar issue or have any idea what could cause that problem?
Thanks
Thursday, May 18, 2017 10:54 AM
All replies
-
User-2064283741 posted
Do you have separate application pools per site or are they shared?
Do have enough resources to open new sites (if you are running 1000 sites with 1000 worker processes you might have memory issues, etc)
Can you open these sites locally on the server itself (useful test to estate some network issues)
Is traffic actually getting to the machine? (Confirm in wireshark/ netmon)
What is the http.sys logs when this is an issue?Thursday, May 18, 2017 5:17 PM -
User-50014173 posted
It looks like httperr log contained entries with "Client_Reset" status. For 10466 lines there was 1269 entries with this status. In previous httperr file for 10467 lines there was 835 Client_Reset entries and in previous httperr file 210 entries.
EDIT: Yes application pools runs under separate threads. Resources should be sufficient for handling all websites.
Friday, May 19, 2017 8:40 AM -
User-460007017 posted
Hi Webio,
Have you monitored the memory and CPU usage? You could even check whether there is something wrong with Check whether CPU usage throttling action has been enabled in application pool advanced setting. Besides, Please make sure there is no memory leak and session will never be closed.
This link also provide the difference between recycling and IIS RESET:
https://fullsocrates.wordpress.com/2012/07/25/iisreset-vs-recycling-application-pools/
Best Regards,
Yuk Ding
Friday, May 19, 2017 9:16 AM -
User-50014173 posted
Issue occured again. Take a look at example httperr logs:
2017-07-03 14:10:25 REMOTE IP 29697 LOCAL IP 80 HTTP/1.1 GET / - 503 1419 Disabled LOCAL APP POOL NAME 2017-07-03 14:10:25 REMOTE IP 10047 LOCAL IP 80 HTTP/1.1 GET / - 503 1419 Disabled LOCAL APP POOL NAME 2017-07-03 14:10:25 REMOTE IP 38351 LOCAL IP 80 HTTP/1.1 GET / - 503 1419 Disabled LOCAL APP POOL NAME 2017-07-03 14:10:25 REMOTE IP 38361 LOCAL IP 80 HTTP/1.1 GET / - 503 1419 Disabled LOCAL APP POOL NAME 2017-07-03 14:10:25 REMOTE IP 8612 LOCAL IP 80 HTTP/1.1 GET / - 503 1419 Disabled LOCAL APP POOL NAME 2017-07-03 14:10:25 REMOTE IP 5083 LOCAL IP 80 HTTP/1.1 GET / - 503 1419 Disabled LOCAL APP POOL NAME 2017-07-03 14:10:25 REMOTE IP 10890 LOCAL IP 80 HTTP/1.1 GET / - 503 1419 Disabled LOCAL APP POOL NAME 2017-07-03 14:10:25 REMOTE IP 11012 LOCAL IP 80 HTTP/1.1 GET / - 503 1419 Disabled LOCAL APP POOL NAME 2017-07-03 14:10:25 REMOTE IP 11318 LOCAL IP 80 HTTP/1.1 GET / - 503 1419 Disabled LOCAL APP POOL NAME 2017-07-03 14:10:25 REMOTE IP 11991 LOCAL IP 80 HTTP/1.1 GET / - 503 1419 Disabled LOCAL APP POOL NAME
this just a fraction of httperr log which contained Disabled errors for multiple websites. Not all websites where down only part of them. Interesting thing that even if log states that there was 503 Disabled website status website was running and app pool was also running (according to IIS Manager). When I was trying to access "Disabled" website browser was loading page indifinetely but w3wp process for this website was not running on Windows Server side (again website and app pool had running status). Trying to stop and run or recycle website and app pool was not helping. Only solution is to perform iisreset which is fixing issue right away but is restarting all websites which for obvious reasons is not any close to solution.
System hosting websites has been recently updated to Windows Server 2016 (Standard) but this issue occured also on Windows Server 2012.
Any suggestions?
All app pools are set to recycle on 0:30 and they are not being suspended but Terminated after idle time-out. They are also set to ThrottleUnderLoad
P.S. Website which was first with multiple Disabled errors in httperr was not using almost none of cpu and its memory usage was at about 100-150mb.
Monday, July 3, 2017 4:21 PM -
User-460007017 posted
Hi Webio,
If IIS specific website return 503. It could probably be caused by the concurrent connection. It depend on how did you set the concurrent connection, queue length in IIS. If the concurrent connection larger than queue length, it could return 503 error. Besides. if the connection in worker process doesn't get disconnected, it could return 503 as well. You may need to check the application event in event viewer. It could provide the error message. Besides, you also need to debug diagnose tool to analyze the dump file. Just ensure there is no memory leak. Also if the max thread in thread pool get stuck. The 503 error could occur.
Best Regards,
Yuk Ding
Friday, July 7, 2017 8:46 AM -
User-50014173 posted
When you mention concurrent connections you are talking about this params:
<applicationPool maxConcurrentRequestsPerCPU="5000" maxConcurrentThreadsPerCPU="0" requestQueueLimit="5000"/>
from aspnet.config file?
Thursday, July 13, 2017 8:27 AM -
User-2064283741 posted
The reason in the http.sys logs doesn't indicate any load issues.
Disabled A service unavailable error occurred (an HTTP error 503). The service is not available because an administrator has taken the application offline
Your site/app is not running at the time. That is what http.sys logs say and your experience.
Thursday, July 13, 2017 8:45 AM -
User-50014173 posted
Ok but why it is not starting? Applications are set as running for website and its dedicated application pool but when trying to reach them browser is loading indefinitely on client side and on server side no w3wp process is being started for this website.
Thursday, July 13, 2017 8:56 AM -
User-2064283741 posted
I'm not sure as to why (could only briefly reply before and knew that question would come up. I just didn't think it was worth following the load path per app pool). Also it might be tricky to troubleshoot. As the worker process doesn't seem to spin up there is nothing to capture by the normal debug methods on the worker process.
You have a lot of sites and a lot of worker processes running. Maybe it is something deeper around that. But we are into Microsoft support territory here.Thursday, July 13, 2017 11:24 AM -
User-50014173 posted
For now only thing which comes to my mind is some kind of connection limit on some level. When checking current connections in perf mon value is between 550 and 830 so maybe when it is reaching some higher values problem occurs. Issue is impossible to replicate using any steps and occurs on random time so I will have to wait for another occurence.
Thursday, July 13, 2017 11:34 AM -
User-2064283741 posted
Those connection amounts small overall (I regularly see 10,000 connections on a low overhead site that is an API and 5,000 on complex sites)
Maybe it is the spread of the connections over the amount of different sites/SSLs/etc
Thursday, July 13, 2017 11:42 AM -
User-50014173 posted
This is a shared hosting server. There are some higher usage sites but there are also almost no user sites too. Perfmon counter Web Service\Current Connections _Total shows IMHO proper values (1500/750 current users gives 2 active users in moment of reading per site which is realistic value). System has 2 quad core CPUs and 96GB of memory. Task manager shows memory and cpu usage mostly at 50-65% level, disk queue length from average 0.1-0.3 so basically everything looks normal as it should.
Thursday, July 13, 2017 11:55 AM -
User2107662878 posted
Did this issue ever get resolved? We are having exactly the same issue and nothing seems to fix it.
Sunday, March 21, 2021 8:07 PM