Worker process crashes when hosting WCF services inside IIS 6 RRS feed

  • Question

  • Hi,

    I am trying to figure out why IIS 6 is crashing out when hosting some WCF services... maybe someone can point me in the right direction here? Smile

    The main thing that I would like to know is: is it okay to call methods asynchronously from one WCF service to another, where both are contained in the same app pool, hosted within IIS 6?

    The design of the system is as follows: We have an frontend (app pool #1), which calls asynchronous WCF services (app pool #2), which in turn also calls asynchronous WCF services (in app pool #3). This "service to service" model is based off this article on building a WCF router - as we require the initial service operations to be processed and proxied according to various business rules.  We need to do a lot of async calls as we have very high client loads with many IO bound requests (complicated database searches, calls to external web services, etc). In our live environment, the two WCF app pools are hosted on dedicated, separate boxes, and in dev, all the services are hosted within the one instance of IIS. The issue reported in this posting applies to both environments.

    Under load, the w3wp.exe process will crash out writing the typical three errors to the event log ("fatal communication error", "faulting application w3wp.exe: faulting module mscorwks.dll", and "fatal execution error"). The HTTP error logs report "connection abandoned by app pool". No tracing information was written out related to the crashes (such as what I would expect: null pointer exceptions, stack overflows, etc). We do use a lot of application tracing, and by all accounts, the services appear to be performing their tasks correctly... but upon completion, the w3wp.exe process sometimes just terminates. On occasion, we also get invalid responses back from the WCF services (our web app will report that no results were found - which I suspect is a side-effect of the underlying services crashing out and restarting, but with http.sys staying alive).

    We have all the WCF service classes (about ten in total) spread across two app pools, and we have limited the physical memory per worker process to in this case 200Mb. The web garden process limit is currently set to 1, but we have experienced the same problems when we have had a garden of 4 processes as well.

    On rare occasions, the w3wp process will crash under light load, however, it is normally under heavy load that the applications crash out. The main trigger of crashing is when attempts to recycle the worker process (due to the memory limit being exceeded). This will normally (but not always) result in a crash. I would expect (just at a guess) asp to ask the w3wp process if it is ready to shut down, and once it replies that it is, the process is cleanly recycled. However, what I think might be going on is that the worker process replies that it is ready to shut down (i.e. it has finished serving requests), but in reality, there are still asynchronous calls yet to return (from other services within the app pool). Subsequently, a depended-upon hosted service is shut down, which "breaks" WCF/asp when the EndAsyncMethod() is called by the web client or other consuming service.

    Even if we set no upper limit on memory usage per process, I imagine that will recycle the w3wp.exe process from time to time, especially if it is consuming a large amount of memory for a long period of time. Does anyone know if this is a safe assumption?

    I will shortly put each of the WCF services in its own app pool, just to see if this fixes the problem, and I will post back to this newsgroup to let you know how I am getting on.

    I have also tried changing the AspNetCompatibilityRequirement attribute to true (as these WCF services will always be hosted within IIS), which did change the error messages a bit, but did not fix the crash.

    I have also traced the crashes with IISState and then following up with WinDbg, but this did not tell me much. All of the managed threads (i.e. managed code) stacks were in system libraries (not user code) at the time of crash. Tomorrow I will install Debug Diagnostics on the production server (I was trying to avoid messing with the live server until now), and maybe this will give me more of an idea as to where the crash is actually happening.

    You never know... maybe there is an unhandled exception somewhere in a destructor, for example, that is causing the worker process to crash. IIState certainly didn't indicate this, but hopefully Debug Diagnostics will give me some more useful information. However, the services used to work fine before they were encapsulated within WCF, so this is why I doubt the problem is within user-level code.

    If anyone knows for sure that calling WCF service operations asynchronously from (and/or from IIS-hosted WCF services - within the same app pool/worker process or otherwise) is a bad thing to do... please do let me know, as this pattern/strategy has been applied extensively through the entire solution that I am currently working on. It would be a big task to do a redesign, but if that's what I have to do, I better do it now rather than later! Never put off till runtime what you can do at compile time Smile One other thought is that maybe we should only be hosting an application like this from within WAS/Windows Server 2008 rather than IIS 6/Windows Server 2003?

    The hardware/software details are:
    All machines are Windows Server 2003 R2, Service Pack 2, 2.0.50727, .Net 3.0 (not 3.5)
    Fairly well patched (but not auto-updated)
    There are no Jet/Access calls within the services (so I think we can rule out KB838306)

    If I can't find out anything more from running diagnostics tomorrow, and putting each service class in it's own app pool also doesn't fix the problem, I will raise a case with Microsoft Product Support. But before then, if anyone knows of any ideas, please do feel free to reply... this might be quite a common problem (I have read a lot of unresolved similar posts recently).

    Many thanks!
    Carl Cook,
    Contract Developer,
    PerfICT Solutions NZ Limited.
    Tuesday, April 22, 2008 10:26 AM

All replies

  • Just if anyone was interested in the fix for this problem, we simply applied a service pack to the windows 2003 server in question. The service pack fixed this known issue.
    Saturday, January 1, 2011 11:25 PM