locked
GC hangs with 100% CPU utilization ?

    Question

  • Dear .NET gurus,

    I am trying to debug some .NET 1.1 application and and experience a strange behavior - from time to time the process hangs with 100% CPU and has to be killed. The threads in the application that burn the processor time are actually unmanaged threads busy doing the following:

    WARNING: Stack unwind information not available. Following frames may be wrong.
    09f8ebc8 00aab14e mscorwks!GetCompileInfo+0x1840b
    09f8ebcc 09f8ebd4 0xaab14e
    09f8ebd0 00000000 0x9f8ebd4

    (this one burns 98% of CPU)

    Also there are three additional theads doing this:

    0660f408 79299fec ntdll!KiFastSystemCallRet
    0660f438 7929a54d mscorwks!CorMarkThreadInThreadPool+0x297e
    0660f48c 793394be mscorwks!CorMarkThreadInThreadPool+0x2edf
    0660f5a0 791daddf mscorwks!ReleaseFusionInterfaces+0x913fb
    0660f5b4 791d94bc mscorwks!GetCompileInfo+0x1104
    0660f5bc 791ed194 mscorwks!Ordinal17+0x4997
    0660f6cc 791ed54b mscorwks!GetCompileInfo+0x134b9
    0660f788 791ed5b9 mscorwks!GetCompileInfo+0x13870
    0660f7b0 792e87b7 mscorwks!GetCompileInfo+0x138de
    0660f7fc 792e8886 mscorwks!ReleaseFusionInterfaces+0x406f4
    0660f8a0 791cf03c mscorwks!ReleaseFusionInterfaces+0x407c3
    0660ffb4 7c80b50b mscorwks!CoInitializeCor+0x71b8
    0660ffec 00000000 KERNEL32!GetModuleFileNameA+0x1b4

    (those three burn about 0.5% CPU each one)

    I used Performance Monitor to check which threads in the process are actually running. Then I used WinDbg to break all threads and examine the call stack of the interesting ones.

    My questions are :

    - What may be going on here?
    - Is there a better way to diagnose such situations.

    I have to mention that the debugged application was running 70 threads at that time (including the four described above) - I know that it is not the best design practice, but it is my karma to debug that thing.

    Two more notes:

    1. The debugged application uses sockets - most theads are waiting in blocking calls.

    2. While regularly running, the application produces large number of first-chance exceptions in  kernel32.dll - exception 0xE0434F4D. The exceptions are handled by CLR. Typical exception stacks are:

     kernel32.dll!_RaiseException@16()  + 0x52 
      mscorwks.dll!RaiseTheException()  + 0x65 
      mscorwks.dll!JIT_Throw()  + 0x4d 
      System.dll!7b26046e()  
      mscorwks.dll!COMString::FillStringChecked()  + 0x27 
      mscorwks.dll!InvokeUtil::CheckArg()  + 0x4b 
      0000000c() 

      kernel32.dll!_RaiseException@16()  + 0x52 
      mscorwks.dll!RaiseTheException()  + 0x65 
      mscorwks.dll!JIT_Throw()  + 0x4d 
      mscorlib.dll!79a94ab4()  
      mscorlib.dll!799a4419()  
      mscorwks.dll!COMString::FillSubstring()  + 0x1c 
      mscorlib.dll!79986a08()  
      mscorlib.dll!79a9b67d()  

    Thanks in advance to anyone who can help.

    Ygor


    Thursday, December 01, 2005 6:37 PM

Answers

  • If you are still seeing this behaviour with your improved code, I can try to help you debug it, but if your new code seems to fix it, we should let this thread die and start up a new one for any new issues.

    -Chris

    Tuesday, January 03, 2006 11:39 PM

All replies

  • Few comments:
    a) The thread that you say is burning 90% CPU is probably managed thread.
    > 09f8ebc8 00aab14e mscorwks!GetCompileInfo+0x1840b
    The offset as large as 0x1840b tells you that you are missing debug symbols or the code is the managed code.
    .loadby sos mscrowks
    !clrstack
    To see the managed call stacks.

    b) Exception 0xE0434F4D is how CLR throws a "regular" managed exception.
    So, this is your regular .NET exception throw statement.
    Once again you can/should use
    !clrstack
    !cen

    Thursday, December 01, 2005 9:40 PM
  • Please help!!

    I've learned to use SOS.dll and found after all (on another machine) that the thread that is burning the CPU is unmanaged thread (!threads and !eestack didn't recognized it).

    This is the stack of the 100% CPU thread:

    0f25fb0c 79237ac1 mscorwks!gc_heap::mark_object_simple1+0x2c3
    0f25fb38 79237e82 mscorwks!gc_heap::mark_object_simple+0x1ab
    0f25fb50 7923c666 mscorwks!GCHeap::Promote+0x11f
    0f25fb64 7923c654 mscorwks!PromoteObject+0x10
    0f25fb88 7923cb7d mscorwks!ScanConsecutiveHandlesWithoutUserData+0x26
    0f25fba8 79340cc4 mscorwks!BlockScanBlocksWithoutUserData+0x26
    0f25fbcc 79340d61 mscorwks!ProcessScanQNode+0x2a
    0f25fbf0 79340db7 mscorwks!xxxTableScanQueuedBlocksAsync+0x61
    0f25fc08 79239402 mscorwks!xxxAsyncSegmentIterator+0x1a
    0f25fc48 79340e22 mscorwks!TableScanHandles+0x6c
    0f25fe6c 79239394 mscorwks!xxxTableScanHandlesAsync+0x5c
    0f25feb8 792399d7 mscorwks!HndScanHandlesForGC+0xe2
    0f25fefc 79239ac0 mscorwks!Ref_TraceNormalRoots+0x54
    0f25ff14 792cdb8f mscorwks!CNameSpace::GcScanHandles+0x51
    0f25ff4c 7925a380 mscorwks!gc_heap::c_mark_phase+0xee
    0f25ffa0 792cde63 mscorwks!gc_heap::gc1+0x8c
    0f25ffb0 792cde97 mscorwks!gc_heap::gc_thread_function+0xa6
    0f25ffb4 7c80b50b mscorwks!gc_heap::gc_thread_stub+0x5
    0f25ffec 00000000 KERNEL32!BaseThreadStart+0x37

    Is there anything I can do ? 

    Thanks,

    Ygor
    Sunday, December 04, 2005 7:56 PM
  • If you don't mind me asking how do you know this is thread that's eating CPU?

    This particular thread is doing GC, you can ensure yourself that this is guilty party by checking the performance counter (in perfmon or procexp)
    .NET CLR Memory\% Time in GC
    Thursday, December 08, 2005 1:28 AM
  • Hi.

    I have a similar problem.

    I have a thread that eats all CPU. Examining the call stack of that thread with WinDBG shows that this thread running GC. The "% Time in GC" shows constantly values around 4%. When the CPU is back to normal, the memory consumed by the process decreased by more than 100 MB.

    Normal run shows most of the time ~0 for
    "% Time in GC" and peaks with 15,20,40,60 "% Time in GC".

    Next time that I have 100% CPU for the same process, it's on a different thread, the previos thread with 100% CPU is gone, and again stack shows that it's GC running.

    Any ideas?

    Thanks,
    Ethan.
    Thursday, December 08, 2005 9:03 AM
  • GC could hijack any thread, therefore it doesn't have to be the same one.
    GC Counters are updated once after GC ran and therefore are not always up-to-date.
    15-60 %Time in GC in bad and you should try either ClrProfiler or http://netmemprofiler.com/ 
    Thursday, December 08, 2005 8:56 PM
  • The counters were updated each second, and it's strange that GC ran for 30 minutes and the counters showed only about 4 %. Since when GC runs it stops all other threads, I would expect values close to 100% and not to 0%.
    I have already used memory profiler, and found no answer to the problem
    Sunday, December 11, 2005 8:09 AM
  • Is your app running on server GC or Workstation GC? Have you looked at the stack trace of the threads using SOS? What was the outcome? How many threads are active when the CPU utilization is 100 %?

    Wednesday, December 28, 2005 4:12 PM
  • My app is running with workstation GC.

    The only running thread is doing GC. The total number of threads is 80.

    If I close app that uses substantial amount of memory (like VS), the CPU utilization drops. More free memory seems to help GC (GC needs at least 50 MB, otherwise it throws an exception).

    Recently I changed the architecture and I don't have so many threads, never more than 20 (I'm using thread pool), and the problem of 100% CPU is gone for now.

    Wednesday, December 28, 2005 6:11 PM
  • If you are still seeing this behaviour with your improved code, I can try to help you debug it, but if your new code seems to fix it, we should let this thread die and start up a new one for any new issues.

    -Chris

    Tuesday, January 03, 2006 11:39 PM