none
Windows service hangs on starting after reboot when starting IIS application pools

    Question

  • Hi,

    I have encountered a strange situation where starting an application pool from a windows service (written in C#, set to "Automatic" startup) using WMI or ADSI immediately after the server reboots hangs.

    I'll describe the issue:

    We are developing a large application (Windows 2003 Server SP2, IIS 6.0) which contains the following main processes (these processes are invoked & initialized using a windows service startup procedure when the application is started):

    1) XServer1.exe, XServer2.exe - These processes are native COM-Exe servers, contains some logics, but mainly supplies COM objects to other processes via DCOM (mainly .NET2COM interOp calls & pure COM calls). For example, some of the classic ASP "Application Scope static objects" (w3wp.exe) are COM objects which "live" inside these processes.

    2) dllhost.exe - this is a COM+ application. Some of our DLLs are loaded into this process which acts as a "state server" (the same idea as the ASP.NET out-of-proc sessions server, but for classic ASP pages).

    3) 3 different IIS application pools (we'll call them appPool1\2\3) - containers of our ASP pages, ASP.NET pages, WCF services etc. Code (native C++ COM dlls & C#) in these application pools (w3wp.exe's) usually makes DCOM calls to the processes described in (1) & (2). Only appPool1 can be configured as a Web Garden.

    In order to Start\Stop our application we have written a windows Service (C#) which controls these procedures. Our service process is called XWinService.exe. The service depends on the following windows services (the list began with the first 4 services, ongoing tries made the list like this...):

    W3SVC
    aspnet_state
    COMSysApp
    DcomLaunch
    winmgmt
    lanmanserver
    lanmanworkstation
    seclogon
    Browser
    TermService

    The summary of the Stop procedure of the application (implemented by the service):

    1) Stop all 3 IIS application pools (appPool1\2\3) - This is done to prevent w3wp.exe processes to jump alive when the application is shut-down. This is implemented with WMI from C# (system.Management.dll)
    2) Stop XServer1\2.exe
    3) Stop the COM+ application (dllhost.exe).

    The summary of the Start procedure of the application (implemented by the service):

    1) Execute the Stop procedure - This ensures that no HTTP hits will wake a w3wp.exe process before it's time.

    2) Invokes & Initializes the XServer1\2.exe COM-Exe servers - Initialization is required prior to any w3wp.exe invocation. Only after some object had been initialized, w3wp.exe's can access these servers. This is implemented by .NET2COM InterOp (eventually DCOM).

    3) Invokes & initialized the dllhost.exe (COM+ application) process - This is implemented by the ComAdmin Catalog API (C#).

    4) Starts our 3 application pools - This allows incoming HTTP hits to wake w3wp.exe processes and start serving requests.

    This is the C# code which is responsible to start\stop application pools (WMI). This code runs in our service processes (XWinService.exe):
    Now the issue:

    ConnectionOptions co = new ConnectionOptions();
    ManagementScope scope = new ManagementScope(
        @"\\localhost\root\MicrosoftIISV2", co);
    foreach (string appPool in AppPools)
    {
      string objPath = string.Format(
        "IISApplicationPool.Name='W3SVC/AppPools/{0}'", appPool);
      using (ManagementObject mc = new ManagementObject(objPath))
      {
        mc.Scope = scope;
        if (Operation.ToLower() == "start")
        {
          mc.InvokeMethod("Start", null, null);
            // ### The problematic line of code ###
        }
        else if (Operation.ToLower() == "stop")
        {
          mc.InvokeMethod("Stop", null, null);
        }
        else if (Operation.ToLower() == "recycle")
        {
          mc.InvokeMethod("Recycle", null, null);
        }
      }
    }
    

     

    Prior to rebooting the server, starting the service manually (from the services.msc tool) succeeds without any problems. also, stopping it is OK. We have set the service to start "Automatic", that is, will start when the server (Win2K3 SP2) starts and rebooted the server. When the server started (the login screen appeared), our service was "stuck" (status = "Starting") and will NEVER (it hang for 2 days!) start.

    Analyzing the processes reveled the following:

    1) The XWinService.exe process was stuck on the problematic line of code (### above ###). This hanged for 2 days until we killed the process. Please note: Shutting down the application pools (the Start procedure begins with a Stop procedure) did not hang! No w3wp.exe process was invoked.

    2) From a DUMP file taken (with DebugDiag tool) from XWinService.exe during this "hang" we can see the thread which is waiting. This is the (native) stack trace of it:

     

    Thread 6 - System ID 2784

    Entry point

    mscorwks!Thread::intermediateThreadProc

    Create time

    11/19/2009 1:40:05 PM

    Time spent in user mode

    0 Days 00:00:00.078

    Time spent in kernel mode

    0 Days 00:00:00.781



    This thread is making a COM call to multi-threaded apartment (MTA) in process 884

    Function

    Source

    ntdll!KiFastSystemCallRet

     

    ntdll!NtRequestWaitReplyPort+c

     

    rpcrt4!LRPC_CCALL::SendReceive+230

     

    rpcrt4!I_RpcSendReceive+24

     

    ole32!ThreadSendReceive+138

     

    ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+112

     

    ole32!CRpcChannelBuffer::SendReceive2+d3

     

    ole32!CAptRpcChnl::SendReceive+ab

     

    ole32!CCtxComChnl::SendReceive+1a9

     

    rpcrt4!NdrProxySendReceive+43

     

    rpcrt4!NdrClientCall2+206

     

    rpcrt4!ObjectStublessClient+8b

     

    rpcrt4!ObjectStubless+f

     

    ….

     

    This thread is calling (via DCOM) a component in process 884, which is svchost.exe, running the following services: AeLookupSvc, AudioSrv, Browser, CryptSvc, dmserver, EventSystem, helpsvc, lanmanserver, lanmanworkstation, Schedule, seclogon, SENS, ShellHWDetection, TrkWks, winmgmt, wuauserv, WZCSVC.

    As you can see the "winmgmt" service (responsible for WMI) is running in this process and our service depends on it, so our service will start after winmgmt is started (the same for IIS W3SVC service)..

    The svchost.exe process (884) was dumped and we can see a thread (waiting for a DCOM call to end) accessing process 2880 which is - wmiprvse.exe (I guess this is the WMI server. Don't know if it's relevent, but there were 2 instances of this process). This is the native call stack of the thread (in svchost.exe):

     Thread 48 - System ID 3816

    Entry point

    wbemcore!CCoreQueue::_ThreadEntry

    Create time

    11/19/2009 1:40:56 PM

    Time spent in user mode

    0 Days 00:00:00.00

    Time spent in kernel mode

    0 Days 00:00:00.00



    This thread is making a COM call to multi-threaded apartment (MTA) in process 2880

    Function

    Source

    ntdll!KiFastSystemCallRet

     

    ntdll!NtRequestWaitReplyPort+c

     

    rpcrt4!LRPC_CCALL::SendReceive+230

     

    rpcrt4!I_RpcSendReceive+24

     

    ole32!ThreadSendReceive+138

     

    ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+112

     

    ole32!CRpcChannelBuffer::SendReceive2+d3

     

    ole32!CAptRpcChnl::SendReceive+ab

     

    ole32!CCtxComChnl::SendReceive+1a9

     

     

    3) Setting our service to "Manual" and starting it (manually - after logging into the server or starting it remotely from a different server immediately after reboot) is OK - nothing hangs.

    4) We deleted our service (from the registry!) and placed a batch file in the windows "startup" folder. This batch files calls the service's code, but runs it as a normal C# executable. After server reboot, it also hang on the same problematic line of code (again... for 2 days until we killed it).

    5) Using ADSI (System.DirectoryServices) instead of WMI had the same results (starting the application pools hanged!).
    We have been digging into this for the past 2 weeks...

     


    My questions:

    ==========
    1) Did anyone encounter the same issue?
    2) Does anyone know why it hangs? Is there any additional service dependency we should take in mind?
    3) Does anyone have a solution for this issue?
    4) Why is this happening after a reboot only when the service to set to "Automatic" startup? If we do it manually - everything is Ok!

     

    Thanks in advanced,
    Omri


    *** Small update:
    We have noticed that on VMs (VMware stations) the service hangs after reboot for an average of ~40min, until it starts (note: it never fails to start, but 40min is way too much). An event log message is recorded in the system event log stating that our service hanged for more than 16min (source: Service Control Manager, Event ID: 7044).
    On "regular" machines (real metals) the average time until the service starts is ~55 hours!!! Again, an event log entry is recorded as described above.
    The avergae values were calculated from 10 differens VMs & 8 different "real" servers.

    Friday, November 20, 2009 8:28 PM

Answers

  • 1) It is quite a lot of information ... I guess that not many people will actually finish reading it.
    2) It looks like a very specific problem, so I guess that most probably no one have seen it before.

    Did you try to contact IIS support? (I guess that it is IIS WMI provider which is causing the trouble ...)
    Also I think that IIS forums is more appropriate place for your question: http://forums.iis.net/

    -Karel
    • Marked as answer by eryang Friday, November 27, 2009 6:00 AM
    Tuesday, November 24, 2009 7:07 PM
    Moderator

All replies

  • Hi again,

    I see no one had responded, but I'll post some news anyway...

     We have found out, that prior to starting the application pools, setting the service status to "Started" and opening a new thread (new Thread(... )) which runs the code above (starting the app pools with WMI) solves the issue.

    This is the pseudo code of the OnStart method of the service:

    OnStart{

         StopProcedure();

         InvokeInitXServer1And2(); //COM-Exe servers

         InvokeInitCOMPlusApplication(); //dllhost.exe

         SetServiceStatus(SERVICE_STARTED);

         Thread worker = new Thread(new threadStart(IISAppPoolStartWMI); //Calls the code

    }

     

    This is the only way the service starts in reasonable time (Max of 3 min, Avg of ~1.5 min of real machines and VMs both!) and a w3wp.exe processes is started.

     

    If anyone has an explnation for it (MTA\STA issues?!?!?) I'll be happy to read it.

    Thanks.

     

    Omri

    Monday, November 23, 2009 1:33 AM
  • 1) It is quite a lot of information ... I guess that not many people will actually finish reading it.
    2) It looks like a very specific problem, so I guess that most probably no one have seen it before.

    Did you try to contact IIS support? (I guess that it is IIS WMI provider which is causing the trouble ...)
    Also I think that IIS forums is more appropriate place for your question: http://forums.iis.net/

    -Karel
    • Marked as answer by eryang Friday, November 27, 2009 6:00 AM
    Tuesday, November 24, 2009 7:07 PM
    Moderator
  • Hi Karel,

    1) It is a lot of info - This is all the info we had collected during our investigation of the issue. I tried to be as detialed as possible for a reason - So that when asking for additional info, you will already have it.

    2) Could be a specific problem. Although it is very simple to reproduce. Just write a small C# service which starts an IIS application pool, install it and set it to "Automatic".

    I have tried to contact IIS forums & Microsoft's formal support center. I did not get a reply yet (and from my experience... I will probebly never will...).

    BTW - This also happens with IIS 7.0 on Win2K8 machines.

    Omri
    Saturday, December 12, 2009 1:17 AM
  • 1) Hmmm, that is reasonable, however AFAIK many people will give up reading posts with too much information. (see #2)
    2) If there is simple repro, why do you mention "large application" and complicated architecture? Those are just motivations then (and mostly not relevant for the first wave of investigation). Just create the small application (keep it as small as possible - 10s of lines of code is best) and post steps how to reproduce your problem with it. That will have IMO much better chance to get a response.

    If you contacted official (paid) MS support, then I think you should get an answer soon.
    I bet that IIS forums would welcome a simple application which reproduces your problem with minimal sufficient information too ...

    -Karel
    Saturday, December 12, 2009 11:00 PM
    Moderator