Hang in _CRT_INIT of mixed mode dll when unloading. RRS feed

  • Question

  • We have a C# app compiled in VS2010 against .NET 4.0 that loads many assemblies, some mixed mode, in a new app domain. Occasionally our clients notice "zombie" processes of our app. Each zombie processes corresponds to an ASYNC_NETWORK_IO waits in the client's SQL server. A process dump shows a hang in one of our mixed mode assemblies (the specific assembly varies).

    Based on the calls stack and logs it appears the process is being shut down (presumably because the client canceled the login). It's worth noting that I haven't managed to reproduce the problem myself and the client only sees the issue in their production environment, not their development and test environments.

    Call stack is as follows:

    ntdll.dll!_NtDelayExecution@8()  + 0x15 bytes
    ntdll.dll!_NtDelayExecution@8()  + 0x15 bytes
    KERNELBASE.dll!_Sleep@4()  + 0xf bytes
    > OurLib.Context.dll!_CRT_INIT(void * hDllHandle=0x05dc0000, unsigned long dwReason=0, void * lpreserved=0x00000001)  Line 357 C
    OurLib.Context.dll!__DllMainCRTStartup(void * hDllHandle=0x05dc0000, unsigned long dwReason=0, void * lpreserved=0x00000000)  Line 526 + 0x8 bytes C
    OurLib.Context.dll!_DllMainCRTStartup(void * hDllHandle=0x05dc0000, unsigned long dwReason=0, void * lpreserved=0x00000001)  Line 476 + 0xe bytes C
    mscoreei.dll!__CorDllMain@12()  + 0xde bytes
    mscoree.dll!_ShellShim__CorDllMain@12()  + 0xad bytes
    ntdll.dll!_LdrpCallInitRoutine@16()  + 0x14 bytes
    ntdll.dll!_LdrShutdownProcess@0()  + 0x141 bytes
    ntdll.dll!_RtlExitUserProcess@4()  + 0x74 bytes
    mscoreei.dll!RuntimeDesc::ShutdownAllActiveRuntimes()  + 0xc8 bytes
    mscoreei.dll!CLRRuntimeHostInternalImpl::ShutdownAllRuntimesThenExit()  + 0x15 bytes
    clr.dll!EEPolicy::ExitProcessViaShim()  + 0x66 bytes
    clr.dll!SafeExitProcess()  + 0x99 bytes
    clr.dll!DisableRuntime()  - 0x160f1b bytes
    clr.dll!EEPolicy::HandleExitProcess()  + 0x57 bytes
    clr.dll!__CorExeMainInternal@0()  + 0x11c bytes
    clr.dll!__CorExeMain@0()  + 0x1c bytes
    mscoreei.dll!__CorExeMain@0()  + 0x38 bytes
    mscoree.dll!_ShellShim__CorExeMain@0()  + 0x227 bytes
    mscoree.dll!__CorExeMain_Exported@0()  + 0x8 bytes
    kernel32.dll!@BaseThreadInitThunk@12()  + 0x12 bytes
    ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes
    ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes

    The _CRT_INIT method is in crtdll.c, namely it appears to be the sleep in the code below:

            else if ( dwReason == DLL_PROCESS_DETACH )
                 * Any basic clean-up code that goes here must be
                 * duplicated below in _DllMainCRTStartup for the
                 * case where the user's DllMain() routine fails on a
                 * Process Attach notification. This does not include
                 * calling user C++ destructors, etc.
                 * do _onexit/atexit() terminators
                 * (if there are any)
                 * These terminators MUST be executed in
                 * reverse order (LIFO)!
                 * NOTE:
                 *  This code assumes that __onexitbegin
                 *  points to the first valid onexit()
                 *  entry and that __onexitend points
                 *  past the last valid entry. If
                 *  __onexitbegin == __onexitend, the
                 *  table is empty and there are no
                 *  routines to call.
                void *lock_free=0;
                void *fiberid=((PNT_TIB)NtCurrentTeb())->StackBase;
                int nested=FALSE;
                while((lock_free=InterlockedCompareExchangePointer((volatile PVOID *)&__native_startup_lockfiberid, 0))!=0)
                    /* some other thread is running native startup/shutdown during a cctor/domain unload.
                        Should only happen if this DLL was built using the Everett-compat loader lock fix in vcclrit.h
                    /* wait for the other thread to complete init before we return */

    The comment suggests some other thread has a lock, however based on the dump, the only thread remaining is the main thread. Since the thread which has the lock no longer exists, the code ends up stuck in this loop. I'm really hoping someone can provide some insight as to what the underlying issue could be. This looks like a form of loader lock to me, but it doesn't appear involve DllMain or other code that I am directly in control of so I'm not sure what I can do to attempt to resolve it.

    Tuesday, February 17, 2015 6:21 PM

All replies

  • This is why you don't wait or block in a DllMain.  For some overview on how processes exit and how it is very possible that other threads will be terminated while holding locks, see here and here
    Tuesday, February 17, 2015 8:39 PM
  • The stack does not include a call to DllMain. In fact the call stack appears to be purely Microsoft code as is the snippet I posted. Given that it's not code that we've written but rather Microsoft library code, how can I avoid the problem?
    Friday, February 20, 2015 7:26 PM
  • The code is waiting on another thread to finish with the lock. If there are no other threads at this point in the dump, look for one dying while holding he lock. I'd start with procdump to monitor first chance exceptions during shutdown. Then full dumps for any 'interesting' ones.

    Saturday, February 21, 2015 1:06 AM
  • Thanks, I guess we need to figure out how to reproduce the problem then. It happens sporadically at client sites and until now we've just observed the hung processes on their citrix servers long after the fact.
    Monday, February 23, 2015 5:17 PM
  • Yes, to find the exact cause you'll likely need a repro.

    I should say that this comment is a concern: "Should only happen if this DLL was built using the Everett-compat loader lock fix in vcclrit.h". If that is the case and you're still using vcclrit.h you should rebuild to get rid of it following the note here -

    Tuesday, February 24, 2015 12:30 AM
  • Let me also toss out there that the last 'weird' issue I came across with mixed-mode CRT startup/shutdown was due to using #pragma managed/unmanaged between headers - ( not use these pragmas before include statements...).

    Your CRT stack is in CRT_INIT which is run for globals/static initializers cleanup code -- the issue I vaguely recall had to do with wrapping one of those header globals in a pragma, changing the shutdown behavior between CRT version compilations. So, not obviously the same issue, but something to look for as a source of issues.

    Tuesday, February 24, 2015 12:59 AM