none
Heap Corruption in managed application? System.ExecutionEngineException

    Question

  • I have an application that seems to randomly crash on a System.ExecutionEngineException.  I have several dumps from different machines and different setups that all show heap corruption. The application is entirely C#. There is no unsafe code, or any native libraries used. I didn't think heap corruption / access violations were supposed to be possible in entirely managed code.

    Sorry if this is irrelevant information but this issue has survived an entire re-write so I'm assuming something about the nature of the application is causing this problem.  The app is a server for a game that runs a virtual environment and consumes a substantial amount of memory. Clients connect to the environment via socket and can interact with eachother and the environment. The environment is constantly changing so there is a lot of memory churning happening all the time. The crash appears to occur randomly, sometimes within hours and sometimes in weeks.

    Below is what I see from WinDBG (although I could use some help).

    WinXP 64 machine - pageheap enabled - Concurrent GC

      14    e   b30 0000000023a0a8f0     8b228 Enabled  0000000000000000:0000000000000000 0000000003f003a0     0 MTA (GC) System.ExecutionEngineException (0000000004821228)

    (Error on GC thread)

    !validateheap

    object 000000000af8f4d0: bad member 000000003e4a6810 at 000000000af8f4f0
    curr_object:      000000000af8f4d0
    Last good object: 000000000af8f490

    Win7 machine - Concurrent GC

      14    e   9b8 000000001d1c6d70      b220 Enabled  0000000000000000:0000000000000000 00000000000cff00     0 MTA System.ExecutionEngineException (0000000002791228)

    !validateheap

    object 0000000008e839e8: does not have valid MT
    curr_object:      0000000008e839e8
    Last good object: 0000000008e839c8

    Win7 machine - Server GC

    (No thread exceptions)?

    !validateheap       (4 heaps due to Server GC and quad core machine)

    ------------------------------
    Heap 0
    total 0 objects
    ------------------------------
    Heap 1
    total 0 objects
    ------------------------------
    Heap 2
    total 0 objects
    ------------------------------
    Heap 3
    object 000000014322d978: bad member 000000014322d8b8 at 000000014322d9a8
    curr_object:      000000014322d978
    Last good object: 000000014322d950

    Can anyone enlighten me as to what could possibly be causing this?  Thanks.

    samedi 10 mars 2012 23:48

Réponses

  • If you really need solution and you are willing to pay money for it, you can try paid help (e.g. Microsoft support - I think they will not charge you if it turns out to be Microsoft bug).

    If you are on budget, I would suggest to try GCStress. That has IMO the best chance to catch corruption earlier.

    You can also try to review what kind of DLLs are loaded into your process and if it fails on different HW.
    Here's a thread about similar error caused by third party driver as an example what could corrupt the memory (see S. McGuire's reply on November 10, 2008).
    Another suspected things are malware (check origin of all DLLs), antivirus, invasive tools (like profilers), etc. loaded into your process.
    The problem might be a bug in CLR, .NET Framework or OS (trying latest QFE for CLR/.NET Framework or different OS versions might help you to narrow down where the problem is).

    I hope it helps a bit. Please let us know if you find anything, it will be useful for future references.
    -Karel

    mardi 13 mars 2012 04:39
    Modérateur

Toutes les réponses

  • Hi,

    Please try the following steps.

    Step 1. Check your code.

    Check the code for unsafe or native code usages:
     a) Review the code for unsafe, DllImport statements.
     b) Download .NET Reflector from http://www.reflector.net/ and use it to analyze the application assemblies for PInvoke. In the same way analyze the 3rd party assemblies which are used by the application.

    If unsafe or native code usage found, put an extra attention to those, the most common cause of the heap corruption in such cases is a buffer overflow or an argument type mismatch. Ensure that the buffer supplied to the native code to fill is big enough and that all arguments passed to the native code are of the expected type.

    Step 2. Check if this Corrupted State Exception can be caught.

    To handle such exceptions one need to decorate the method which contains the catch(Exception) statement with [HandleProcessCorruptedStateExceptions] attribute or apply the following in "app.config" file:
    <configuration>
        <runtime>
            <legacyCorruptedStateExceptionsPolicy enabled="true" />
        </runtime>
    </configuration>

     In the case the exception was caught successfully, you can log and examine it. This means this is not a corrupted heap issue.

    Corrupted heap exceptions cannot be handled at all: http://social.msdn.microsoft.com/Forums/en-US/clr/thread/18fb14cb-33be-4af7-9bf1-dc651cdf1239/.

    More information on Corrupted State Exceptions: http://dotnetslackers.com/articles/net/All-about-Corrupted-State-Exceptions-in-NET4.aspx.

    Step 3. Live debugging

    Step 4. Enable MDAs

    Try to use the Managed Debugging Assistants.

    MDAs must be used along with WinDbg.

    Step 5. Enable GCStress.

    An extreme option because the application becomes almost unusable, but still a way to go.

    Step 6. Compile for x86.

    If your application is currently being compiled for "Any CPU" or "x64" platform, try to compile it for "x86" if there is no difference for you which platform to use.

    Step 7. Disable concurrent GC

    There is a reported known issue in .NET 4 reported in this thread http://social.msdn.microsoft.com/Forums/en-US/clr/thread/33920b39-690c-42c8-b04a-0f1f7176835a/. The problem can be solved by disabling the concurrent GC in app.config file:
    <?xml version="1.0"?>
    <configuration>
        <runtime>
            <gcConcurrent enabled="false" />
        </runtime>
    </configuration>


    Best wishes,


    Robin [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    lundi 12 mars 2012 02:07
    Modérateur
  • #1) is not applicable. There is no unsafe code or DLL Import statments. There are no 3rd party libraries used. If there is a problem happening due to native code, then it has to be a .NET object such as Socket or SQLConnection and I highly doubt that. Also, I doubt would have any luck finding a bug in the underlying MS code.  ;-)

    #2) the exception cannot be caught.

    #3) does not produce anything more useful than a crash dump. The application seems to stop in any random place in the code on an access violation (FatalExecutionEngineError) if MDA's are turned on.

    #4) The only MDA that throws is FatalExecutionEngineError

    #5) I have not tried although it is not really an option as the machines are already running under load when the exception occurs.

    #6) This is not an acceptable solution.

    #7) I still experience this issue running Server GC.  My understanding is that this should also resolve the issue caused by Concurrent GC.

    I have several dumps that are all slightly different.  I'm not a pro with WinDBG.  I dont see anything leaking and I dont see any useful information when I analyze -v. 

    Here is some output from analyze -v for first dump listed above:

    FAULTING_IP:
    ntdll!ZwWaitForSingleObject+a
    00000000`77ef047a c3              ret

    EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)
    ExceptionAddress: 0000000077ef047a (ntdll!ZwWaitForSingleObject+0x000000000000000a)
       ExceptionCode: e0434f4d (CLR exception)
      ExceptionFlags: 00000000
    NumberParameters: 0

    --Top of the managed stack ends in String.Concat ??

    MANAGED_STACK:
    (TransitionMU)
    0000000024E8BC70 00000644783B2080 mscorlib_ni!System.String.Concat(System.String, System.String, System.String)+0x80

    EXCEPTION_OBJECT: !pe 4821228
    Exception object: 0000000004821228
    Exception type:   System.ExecutionEngineException
    Message:          <none>
    InnerException:   <none>
    StackTrace (generated):
    <none>
    StackTraceString: <none>
    HResult: 80131506

    MANAGED_OBJECT_NAME:  System.ExecutionEngineException

    MANAGED_STACK_COMMAND:  _EFN_StackTrace

    CONTEXT:  0000000024e8ac70 -- (.cxr 0x24e8ac70)
    rax=000000003e4a3800 rbx=0000000004820000 rcx=0000000000000000
    rdx=0000000039c83800 rsi=0000000000073908 rdi=00000000481f7bc0
    rip=000006447f28c97d rsp=0000000024e8b208 rbp=000000003e49eff0
     r8=00000000000738fd  r9=000000003e4a3fc0 r10=00000000007c948c
    r11=0000000049bfaef8 r12=0000064480235d08 r13=000000003e4a4580
    r14=0000000000000028 r15=00000000007c948b
    iopl=0         nv up ei pl zr na po nc
    cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
    clr!WKS::gc_heap::find_first_object+0x92:
    00000644`7f28c97d f70100000080    test    dword ptr [rcx],80000000h ds:00000000`00000000=????????
    Resetting default scope

    LAST_CONTROL_TRANSFER:  from 000006447f28f530 to 000006447f28c97d

    I really dont know where to go from here. If anyone has any ideas or if I can supply any more information that may be useful I'm happy to provide.

    Thanks!

    lundi 12 mars 2012 23:09
  • Thanks for your update! I am involving more experts to investigate this issue. Please be patient for reply.

    Best wishes,


    Robin [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    mardi 13 mars 2012 01:16
    Modérateur
  • If you really need solution and you are willing to pay money for it, you can try paid help (e.g. Microsoft support - I think they will not charge you if it turns out to be Microsoft bug).

    If you are on budget, I would suggest to try GCStress. That has IMO the best chance to catch corruption earlier.

    You can also try to review what kind of DLLs are loaded into your process and if it fails on different HW.
    Here's a thread about similar error caused by third party driver as an example what could corrupt the memory (see S. McGuire's reply on November 10, 2008).
    Another suspected things are malware (check origin of all DLLs), antivirus, invasive tools (like profilers), etc. loaded into your process.
    The problem might be a bug in CLR, .NET Framework or OS (trying latest QFE for CLR/.NET Framework or different OS versions might help you to narrow down where the problem is).

    I hope it helps a bit. Please let us know if you find anything, it will be useful for future references.
    -Karel

    mardi 13 mars 2012 04:39
    Modérateur
  • Hi,

    Was there ever a solution to this?

    Thanks!

    mercredi 26 juin 2013 14:56