locked
Access violation in mscorwks.dll when profiling an application that creates large number of threads

    質問

  • Hello,

    I have a simple .NET 3.5 application that creates a lot of worker threads for 3D rendering. If rendering is cancelled by the user, all worker threads are destroyed by calling Thread.Abort. When I profile it using the profiler we are developing it sometimes crashes inside mscorwks!ProfileEnter with the following call stack. It happens even when the profiler logic is bypassed by instrumenting all callback functions to immediately return.

    The crash can be reproduced only when the application is compiled in x64 mode.

    Can you help me identify the cause of the crash and is there something that can be done to prevent it from happening?

    Here's the output from WinDbg when the crash happened:

    First chance exceptions are reported before any exception handling.
    This exception may be expected and handled.
    mscorwks! ?? ::FNODOBFM::`string'+0x3af5d:
    000007fe`e94df31d 488b00          mov     rax,qword ptr [rax] ds:bfe5b6d9`bc7bf047=????????????????


    0:027> r
    rax=bfe5b6d9bc7bf047 rbx=000000002952d3c0 rcx=000000002952e9e0
    rdx=000007ff006cd2b8 rsi=000000002952d3c0 rdi=000007ff00753128
    rip=000007fee94df31d rsp=000000002952c480 rbp=000000000303ede0
     r8=000007fee9081ec0  r9=0000000000000000 r10=000007ff007441f0
    r11=0000000000000000 r12=000000002952e9c0 r13=000000002952d3c0
    r14=0000000000000000 r15=0000000000368cd0
    iopl=0         nv up ei ng nz na po nc
    cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010286
    mscorwks! ?? ::FNODOBFM::`string'+0x3af5d:
    000007fe`e94df31d 488b00          mov     rax,qword ptr [rax] ds:bfe5b6d9`bc7bf047=????????????????


    0:027> !eeversion
    2.0.50727.5448 free
    Workstation mode
    SOS Version: 2.0.50727.5448 retail build

    0:027> kb
    RetAddr           : Args to Child                                                           : Call Site
    000007fe`e920e4ca : 00000000`00000001 00000000`00000001 00000000`2952c658 000007fe`e918983c : mscorwks! ?? ::FNODOBFM::`string'+0x3af5d
    000007fe`e9176a21 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000001 : mscorwks!StackTraceInfo::AppendElement+0x4e
    000007fe`e9187bbd : 00000000`00368cd0 00000000`2952d3c0 00000000`2952e9c0 00000000`1e2f0de0 : mscorwks!ExceptionTracker::ProcessManagedCallFrame+0x1f1
    000007fe`e91876d1 : 00000000`2952d5e0 00000000`00000000 00000000`2952e250 000007ff`006cd2b8 : mscorwks!ExceptionTracker::ProcessOSExceptionNotification+0x39d
    00000000`77059d0d : 000007ff`006cd2f9 00000000`2952e9c0 00000000`2952d708 00000000`2952d710 : mscorwks!ProcessCLRException+0x1b1
    00000000`770491af : 00000000`29530000 000007fe`e996b498 00000000`0009c2e8 00000000`003500a0 : ntdll!RtlpExecuteHandlerForException+0xd
    00000000`77081278 : 00000000`2952e250 00000000`2952dd60 00000000`00000001 0000fbcb`00000000 : ntdll!RtlDispatchException+0x45a
    000007fe`fd75cacd : 00000000`04347d50 00000000`2952e440 00000000`1e2f0de0 000007fe`e91dcb31 : ntdll!KiUserExceptionDispatcher+0x2e
    000007fe`e921923f : 00000000`1e2f0de0 00000000`04347d50 00000000`2952e440 00000000`1e2f0de0 : KERNELBASE!RaiseException+0x39
    000007fe`e9169877 : 00000000`04347d50 ffffffff`00000000 00000000`00000000 000007fe`00000001 : mscorwks!RaiseTheExceptionInternalOnly+0x2ff
    000007fe`e96a4b0d : 000007ff`006ccad9 000007fe`00000001 00000000`04347d18 00000000`00000000 : mscorwks!Thread::HandleThreadAbort+0x19b
    000007fe`e930dd0b : 00000000`00000002 00000000`00001200 00000000`00000000 00000000`00000001 : mscorwks!ProfileEnter+0x9d
    000007ff`006cd2f9 : 00000000`00000000 00000000`00000000 00000000`00000000 bfe76646`42ee68d8 : mscorwks!ProfileEnterNaked+0x6b
    000007ff`006cd26f : 00000000`04347d18 00000000`04347cd8 00000000`042f8b68 3fb1ce4a`00000002 : 0x7ff`006cd2f9
    000007ff`006ccf45 : 00000000`04347cd8 00000000`042f8b68 00000000`00000000 00000000`04347c98 : 0x7ff`006cd26f
    000007ff`006cc5d3 : 000007ff`006aacc8 00000000`04347cd8 00000000`042f8b68 bfe76646`42ee68d8 : 0x7ff`006ccf45
    000007ff`006cc35b : 00000000`04318370 00000000`04347b98 00000000`042f6f70 00000000`042f7338 : 0x7ff`006cc5d3
    000007ff`006cac1f : 00000000`04318370 00000000`04347b98 00000000`042f6f70 00000000`00000000 : 0x7ff`006cc35b
    000007ff`006c9258 : 00000000`04318370 00000000`0439beb0 00000000`e47839b4 00000000`00000001 : 0x7ff`006cac1f
    000007ff`006c9130 : 00000000`042d3fd8 00000000`0430e380 00000000`0439c0c8 00000000`1e2f0de0 : 0x7ff`006c9258
    000007ff`006c8bf5 : 00000000`0430e3e0 00000000`0430e380 00000000`77145410 00000000`00000000 : 0x7ff`006c9130
    000007ff`006c8aa6 : 00000000`0439c038 00000000`042f7638 00000000`0439bfd0 00000000`2952eea0 : 0x7ff`006c8bf5
    000007fe`e9310542 : 00000000`0439bfd0 00000000`0439beb0 00000000`1e2f0de0 000007fe`e47839b4 : 0x7ff`006c8aa6
    000007fe`e91f4143 : cccccccc`cccccccc cccccccc`cccccccc 00000000`2952ef50 00000000`00000000 : mscorwks!CallDescrWorker+0x82
    000007fe`e96eb861 : 00000000`2952f018 00000000`00000000 00000000`2952f250 00000000`00000008 : mscorwks!CallDescrWorkerWithHandler+0xd3
    000007fe`e952958a : 00000000`2952f670 00000000`2952fab0 00000000`2952f710 000007ff`007206d0 : mscorwks!MethodDesc::CallDescr+0x2b1
    000007fe`e913dae0 : cccccccc`00000000 ffffffff`fffffffe 00000000`2952f540 000007fe`e91d6c7e : mscorwks! ?? ::FNODOBFM::`string'+0x851ca
    000007fe`e9224025 : 00000000`1e2f0de0 00000642`ff8d5e7f 00000000`00ec7738 00000000`2952f670 : mscorwks!ManagedThreadCallState::IsAppDomainEqual+0x4c
    000007fe`e925f0b9 : 00000000`00000000 00000000`1e2f0de0 00000000`2952f5b0 00000000`2952f610 : mscorwks!SVR::gc_heap::make_heap_segment+0x155
    000007fe`e92d0769 : 00000000`2952f670 ffffffff`ffffffff 00000000`1e2f0de0 00000000`00000000 : mscorwks!DoOpenIAssemblyStress::DoOpenIAssemblyStress+0x99
    000007fe`e9132d3f : 00000000`1e2f0de0 00000000`00000000 00000200`002b0000 00000000`00000008 : mscorwks!CNgenEntry::IsSigEqual+0xd5
    000007fe`e913b0dc : 00000002`00000000 0000ffff`00001f00 00000000`1e2f0de0 00000000`2952f728 : mscorwks!ThreadNative::KickOffThread+0xd3
    00000000`76f2652d : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : mscorwks!Thread::intermediateThreadProc+0x78
    00000000`7705c521 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0xd
    00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d

    0:027> !dumpstack
    OS Thread Id: 0x1059c (27)
    Child-SP         RetAddr          Call Site
    000000002952c480 000007fee920e4ca mscorwks! ?? ::FNODOBFM::`string'+0x3af5d
    000000002952c4c0 000007fee9176a21 mscorwks!StackTraceInfo::AppendElement+0x4e
    000000002952c510 000007fee9187bbd mscorwks!ExceptionTracker::ProcessManagedCallFrame+0x1f1
    000000002952c720 000007fee91876d1 mscorwks!ExceptionTracker::ProcessOSExceptionNotification+0x39d
    000000002952d5b0 0000000077059d0d mscorwks!ProcessCLRException+0x1b1
    000000002952d650 00000000770491af ntdll!RtlpExecuteHandlerForException+0xd
    000000002952d680 0000000077081278 ntdll!RtlDispatchException+0x45a
    000000002952dd60 000007fefd75cacd ntdll!KiUserExceptionDispatcher+0x2e
    000000002952e320 000007fee921923f KERNELBASE!RaiseException+0x39
    000000002952e3f0 000007fee9169877 mscorwks!RaiseTheExceptionInternalOnly+0x2ff
    000000002952e4e0 000007fee96a4b0d mscorwks!Thread::HandleThreadAbort+0x19b
    000000002952e580 000007fee930dd0b mscorwks!ProfileEnter+0x9d
    000000002952e8f0 000007ff006cd2f9 mscorwks!ProfileEnterNaked+0x6b
    000000002952e9c0 000007ff006cd26f System_Core!System.Linq.Enumerable+WhereEnumerableIterator`1[[System.__Canon, mscorlib]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>, System.Func`2<System.__Canon,Boolean>)+0x39
    000000002952ea00 000007ff006ccf45 System_Core!System.Linq.Enumerable+WhereSelectArrayIterator`2[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].Where(System.Func`2<System.__Canon,Boolean>)+0x9f
    000000002952ea50 000007ff006cc5d3 System_Core!System.Linq.Enumerable.Where[[System.__Canon, mscorlib]](System.Collections.Generic.IEnumerable`1<System.__Canon>, System.Func`2<System.__Canon,Boolean>)+0x105
    000000002952eab0 000007ff006cc35b Demo_RayTracer!RayTracing.RayTracer.Intersections(RayTracing.Ray, RayTracing.Scene)+0x183
    000000002952eb40 000007ff006cac1f Demo_RayTracer!RayTracing.RayTracer.TraceRay(RayTracing.Ray, RayTracing.Scene, Int32)+0x8b
    000000002952ebb0 000007ff006c9258 Demo_RayTracer!RayTracing.RayTracer.Render()+0x2af
    000000002952ed60 000007ff006c9130 Demo_RayTracer!Demo.RayTracer.RayTracer.<StartRender>b__2(RayTracing.RenderParameters)+0xd8
    000000002952edd0 000007ff006c8bf5 Demo_RayTracer!Demo.RayTracer.ThreadData+<>c__DisplayClass3.<.ctor>b__1(System.Object)+0x80
    000000002952ee10 000007ff006c8aa6 mscorlib!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+0xc5
    000000002952ee50 000007fee9310542 mscorlib!System.Threading.ThreadHelper.ThreadStart(System.Object)+0x96
    000000002952ee90 000007fee91f4143 mscorwks!CallDescrWorker+0x82
    000000002952eee0 000007fee96eb861 mscorwks!CallDescrWorkerWithHandler+0xd3
    000000002952ef80 000007fee952958a mscorwks!MethodDesc::CallDescr+0x2b1
    000000002952f1d0 000007fee913dae0 mscorwks! ?? ::FNODOBFM::`string'+0x851ca
    000000002952f4f0 000007fee9224025 mscorwks!ManagedThreadCallState::IsAppDomainEqual+0x4c
    000000002952f540 000007fee925f0b9 mscorwks!SVR::gc_heap::make_heap_segment+0x155
    000000002952f610 000007fee92d0769 mscorwks!DoOpenIAssemblyStress::DoOpenIAssemblyStress+0x99
    000000002952f650 000007fee9132d3f mscorwks!CNgenEntry::IsSigEqual+0xd5
    000000002952f6b0 000007fee913b0dc mscorwks!ThreadNative::KickOffThread+0xd3
    000000002952f790 0000000076f2652d mscorwks!Thread::intermediateThreadProc+0x78
    000000002952fae0 000000007705c521 KERNEL32!BaseThreadInitThunk+0xd
    000000002952fb10 0000000000000000 ntdll!RtlUserThreadStart+0x1d


    0:027> !threads
    ThreadCount: 54
    UnstartedThread: 0
    BackgroundThread: 1
    PendingThread: 0
    DeadThread: 44
    Hosted Runtime: no
                                                  PreEmptive                                                Lock
           ID OSID        ThreadOBJ     State   GC     GC Alloc Context                  Domain           Count APT Exception
       0    1 10368 0000000000302340   2006020 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 STA
       7    2 10394 00000000030248a0      b220 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 MTA (Finalizer)
    XXXX   1d    0 000000001e331fa0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   1e    0 000000001e332570      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   1f    0 000000001e332b40      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   20    0 000000001e333110      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   21    0 000000001e3336e0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   22    0 000000001e333cb0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   23    0 000000001e334280      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   24    0 000000001e334850      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   25    0 000000001e334e20      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   26    0 000000000309e5e0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   27    0 000000000309ebb0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   28    0 000000000309f180      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   29    0 000000000309f750      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   2a    0 000000000309fd20      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   2b    0 00000000030a02f0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   2c    0 00000000030a08c0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   2d    0 00000000030a0e90      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   2e    0 00000000030a1460      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   2f    0 00000000030a1a30      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 MTA
    XXXX   30    0 000000001e2f70f0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 MTA
    XXXX   31    0 000000001e2f76c0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 MTA
    XXXX   32    0 000000001e2f7c90      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 MTA
    XXXX   33    0 000000001e2f8260      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 MTA
    XXXX   34    0 000000001e2f8830      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 MTA
    XXXX   35    0 000000001e358040      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 MTA
    XXXX   36    0 000000001e358610      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 MTA
    XXXX   1c    0 000000001e3319d0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   1b    0 000000001e358be0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   1a    0 000000001e3591b0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   19    0 000000001e359780      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   18    0 000000001e359d50      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   17    0 000000001e35a320      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   16    0 000000001e35a8f0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   15    0 000000001e35aec0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   14    0 000000001e35b490      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   13    0 000000001e2ed990      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   12    0 000000001e2edf60      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   11    0 000000001e2ee530      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX   10    0 000000001e2eeb00      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX    f    0 000000001e2ef0d0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX    e    0 000000001e2ef6a0      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX    d    0 000000001e2efc70      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX    c    0 000000001e2f0240      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
    XXXX    b    0 000000001e2f0810      9820 Enabled  0000000000000000:0000000000000000 000000000035a2f0     0 Ukn
      27    a 1059c 000000001e2f0de0  1000b021 Disabled 0000000004347e28:0000000004348d90 000000000035a2f0     0 MTA System.Threading.ThreadAbortException (0000000004347d50)
      28    9 105a0 000000001e31b750      b020 Disabled 00000000043368d8:0000000004336d90 000000000035a2f0     0 MTA
      29    8 105a4 000000001e31bd20      b020 Disabled 00000000043406d0:0000000004340d90 000000000035a2f0     0 MTA
      30    7 105a8 000000001e31c2f0      b020 Disabled 0000000004344000:0000000004344d90 000000000035a2f0     0 MTA
      31    6 105ac 000000001e31c8c0      b020 Disabled 00000000043421c0:0000000004342d90 000000000035a2f0     0 MTA
      32    5 105b0 000000001e31ce90      b020 Disabled 0000000004349b30:000000000434ad90 000000000035a2f0     0 MTA
      33    4 105b4 000000001e31d460      b020 Disabled 000000000434af40:000000000434cd90 000000000035a2f0     0 MTA
      34    3 105b8 000000001e31da30      b020 Disabled 00000000043461c0:0000000004346d90 000000000035a2f0     0 MTA

    Thank you in advance for your help and suggestions!

    Kind regards,
    Martin Bektchiev

    2012年3月5日 13:10

回答

  • Hi, Martin.

    By "crash", can I infer that the entire application is torn down as a result?  Or do you just a first-chance exception, and then it gets handled somehow later on?

    If the former, this could be a bug in the CLR, though it's not something I've heard of before.  I doubt we'd be able to service 3.5 for this, but it would be very interesting to know if you can can still get this issue to reproduce with CLR 4, in which case, we would look at providing a fix for an upcoming release (though again, probably not a servicing release for CLR 4).

    Your best bet is to enter a bug on the Microsoft Connect site (http://connect.microsoft.com/VisualStudio), and mention my name (David Broman) to help it get routed to me quickly.  Best if you can provide a reduced reproducing case of the app code that does all the thread aborts.  If the profiler needs to do nothing other than enable enter/leave/tailcall we can use one of our own for that (though let us know if the profiler needs to do more than that).

    A quick workaround idea: To reduce the statistical likelihood of running into this, you could try enabling fast-path enter/leave/tailcall (your stack implied you're using slow-path) if fast-path is sufficient for your needs.  At least then you are less likely of a thread being inside an ELT handler at the time the thread aborts are done.

    Thanks,
    Dave

    2012年3月5日 22:05
    所有者

すべての返信

  • Hi, Martin.

    By "crash", can I infer that the entire application is torn down as a result?  Or do you just a first-chance exception, and then it gets handled somehow later on?

    If the former, this could be a bug in the CLR, though it's not something I've heard of before.  I doubt we'd be able to service 3.5 for this, but it would be very interesting to know if you can can still get this issue to reproduce with CLR 4, in which case, we would look at providing a fix for an upcoming release (though again, probably not a servicing release for CLR 4).

    Your best bet is to enter a bug on the Microsoft Connect site (http://connect.microsoft.com/VisualStudio), and mention my name (David Broman) to help it get routed to me quickly.  Best if you can provide a reduced reproducing case of the app code that does all the thread aborts.  If the profiler needs to do nothing other than enable enter/leave/tailcall we can use one of our own for that (though let us know if the profiler needs to do more than that).

    A quick workaround idea: To reduce the statistical likelihood of running into this, you could try enabling fast-path enter/leave/tailcall (your stack implied you're using slow-path) if fast-path is sufficient for your needs.  At least then you are less likely of a thread being inside an ELT handler at the time the thread aborts are done.

    Thanks,
    Dave

    2012年3月5日 22:05
    所有者
  • Hi Dave,

    Yes, by crash I mean that the exception is unhandled and the process gets terminated. As far as I can tell the profiler doesn't need to do anything except set its ELT hooks, which could be empty. On my machine this reproduces quite easily, but I have failed to make it happen on any of my colleagues' machines.

    I tried running the same application under CLR 4 and the crash doesn't occur, so most probably it has been fixed there. We use slow-path ELT because we rely on calling ICorProfiler2::DoStackSnapshot to seed our shadow stack after re-enabling profiling. We set the COR_PRF_ENABLE_STACK_SNAPSHOT flag when calling SetEventMask in Initialize. I ran a test by unsetting this flag in the profiler and as you supposed the exception stopped occurring.

    Thank you very much for your suggestions.

    Kind regards,
    Martin

    2012年3月6日 16:08