none
Finding memory leaks using windbg RRS feed

  • Question

  • Hello everybody,

    I'm trying to find a memory leak in a dotnet-app using windbg and the sos-extension. Under some rare circumstances, the app allocates ~1.4 GB and then dies with an OutOfMemory-Exception. However, this happens rarely and so the bug is hard to track down.

    I can see that the heap is littered with Byte-Arrays when the OOM-Exception is thrown. Using gcroot I can see that those byte-arrays are used by (and only by) OracleDataReader-Objects.

    However, when I search for the roots of those OracleDataReaders I get the output:

    DOMAIN(00166420):HANDLE(WeakSh):aa3400:Root:3a0b0e60(Devart.Data.Oracle.OracleDataReader)->
    5a6df040(System.Byte[])

    which seems to only tell me what I already know: that the DataReaders are using the bytearrays. But I can't see who's referencing the datareaders. Why aren't those objects gc'ed? Is there any way to find out?

    And yes, I *am* calling Dispose on all the datareaders ;)

    Any help with this would be highly appreciated!

    Cheers
    Stefan



    Friday, July 31, 2009 3:45 PM

All replies

  • I think you running toooo low for memory leaks detection but in any case - you should ask this person - may be he knows:  http://blogs.msdn.com/alejacma/default.aspx
    Monday, August 3, 2009 3:26 PM
  • Is this an app written in VB.NET?  Did you ship the Debug build?

    Hans Passant.
    Monday, August 3, 2009 6:59 PM
    Moderator
  • Hello again,

    thanks for your replies. As for your question nobugz: The dump is from a debug build, the app is written in C#. When I ran into the OOM-Exception I attached WinDbg and saved the dump.

    I looked a little bit further and found the following: For some of the OracleDataReader-objects !gcroot results in the output

    Finalizer queue:Root:39bcd118(Devart.Data.Oracle.OracleDataReader)

    Looking at the output of !finalizequeue I discovered the following:

    SyncBlocks to be cleaned up: 6
    MTA Interfaces to be released: 0
    STA Interfaces to be released: 0
    ----------------------------------
    generation 0 has 2069 finalizable objects (0898b350->0898d3a4)
    generation 1 has 42 finalizable objects (0898b2a8->0898b350)
    generation 2 has 17118 finalizable objects (0897a730->0898b2a8)
    Ready for finalization 10115 objects (0898d3a4->089971b0)


    Although I'm not sure how to interprete this I'm quite sure it's not a healthy condition. I already found this article on the web, it seems to be related to my problem: http://blogs.msdn.com/tess/archive/2007/10/19/net-finalizer-memory-leak-debugging-with-sos-dll-in-visual-studio.aspx

    Though most of the symptoms in this article do not apply in my case :( If anybody has an idea where to look next I'd be happy to hear it since I'm (obviously) pretty new to this.

    Cheers
    Stefan





    Tuesday, August 4, 2009 1:02 PM
  • I'm still looking into this. The execution of the finalizerthread seems to be stuck in a WaitForSingleObject()-call, this seems to be the root of the problem. After googleing some more I found this

    http://mcfunley.com/355/some-twists-on-blocked-finalizers

    Looks a lot like the situation I got here.

    The solution suggested on the the blog (decorating Main() with [MTAThread]) doesn't work for me since it causes an exception when calling Show() on a form that's sitting on another form ("Drag and Drop registration failed; current thread must be STA", something like that) :(

    Any ideas?

    Wednesday, August 5, 2009 12:34 PM
  • Sound to me like a flaw in the Oracle provider.  Blocking in the finalizer is about as evil as it gets.  There's a 2 second timeout on the finalizer thread, that could explain the mass of unfinalized objects you've got.  Religiously using the Dispose() method should help to relieve the pressure on the finalizers.  Making the UI thread MTA is illegal in programs that create windows.  To get support for Oracle provider problems you probably need an Oracle support forum.
    Hans Passant.
    Wednesday, August 5, 2009 2:10 PM
    Moderator
  • What makes you think it's the OracleDataReader that's blocking the finalizer thread? The ODRs (and the bytearrays) use most of the memory, but other objects aren't gc'ed as well.

    I've tried to isolate the problem by creating a little test app that does the same database-stuff as the "real" app. The problem does not happen here, so it seems that something else is blocking the finalizer.

    What's the best way to find out which object could be responsible for the blocking? As far as I understand, the finalizerthread can ony be blocked if some FinalizerMethod is blocking for some reason, like

    ~MyClass
    {
         // some blocking stuff happening here...
    }

    right? I can't find something like that in the code.



    Wednesday, August 5, 2009 3:20 PM
  • What makes you think it is not the OracleDataReader that causes this problem?  Once the time-out is up, nothing else gets finalized.  You can't find it in the code because you don't have the source code for the provider.  Presumably.

    Hans Passant.
    Wednesday, August 5, 2009 3:40 PM
    Moderator
  • As I stated in my previous post, I isolated the database related stuff and the blocking is *not* occuring in that isolated environment. So I'm pretty positive it's not the OracleDataReader that causes the problem. That's why I believe some other object is blocking the finalizerthread.  I blamed the ODR in my initial post since it's taking up most of the space on the heap, but the blocking seems to happen somewhere else. But how do I find out where?!


    Thursday, August 6, 2009 6:45 AM
  • Stefan,
    You can try to find if there's a deadlock using the SOSEX extension for WinDbg - it has a very useful !dlk command that might help. Google (or Bing!) on "Sosex" to download the extension

    -Amit
    Thursday, August 6, 2009 11:17 AM
  • Hi Amit,

    thanks for the hint. !dlk yields the output "No deadlocks detected" :( !Waitlist from SIEExtPub does not produce any output.

    I  tried removing all the finalizers from the code - though this wouldn't be a permanent solution, I just wanted to see if the problem would persist. It does. Can I conclude from this that there are blocking finalizers in the third party components we use?  

    Is there any way to determine which object's finalize is being called from the finalizerthread? Is there any way to see what's in the freachable queue?
    Thursday, August 6, 2009 12:19 PM
  • I keep posting my "findings", maybe somebody can make a rhyme out of this:

    The callstack of the finalizer thread looks like this when it's dead:

    >    ntdll.dll!KiFastSystemCallRet()    
         [Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]   
         ntdll.dll!NtWaitForSingleObject()  + 0xc bytes   
         kernel32.dll!WaitForSingleObject()  + 0x12 bytes   
         ole32.dll!CoQueryProxyBlanket()  + 0x5c5 bytes   
         ole32.dll!775d1e50()    
         ole32.dll!ReleaseStgMedium()  + 0x12e7 bytes   
         ole32.dll!CoGetObject()  + 0xd26 bytes   
         rpcrt4.dll!NdrProxySendReceive()  + 0x40 bytes   
         rpcrt4.dll!NdrProxySendReceive()  + 0x138 bytes   
         rpcrt4.dll!NdrProxySendReceive()  + 0xcd bytes   
         rpcrt4.dll!RpcBindingSetObject()  + 0x4d bytes   
         ole32.dll!CoGetObject()  + 0x901 bytes   
         ole32.dll!CoCreateObjectInContext()  + 0xd1c bytes   
         mscorwks.dll!CtxEntry::EnterContextOle32BugAware()  + 0x2b bytes   
         mscorwks.dll!CtxEntry::EnterContext()  + 0x168 bytes   
         mscorwks.dll!RCWCleanupList::ReleaseRCWListInCorrectCtx()  + 0xf7 bytes   
         mscorwks.dll!RCWCleanupList::CleanupAllWrappers()  + 0xdbf50 bytes   
         mscorwks.dll!SyncBlockCache::CleanupSyncBlocks()  + 0xdb bytes   
         mscorwks.dll!Thread::DoExtraWorkForFinalizer()  + 0x4c4d7 bytes   
         mscorwks.dll!WKS::GCHeap::FinalizerThreadWorker()  + 0x89 bytes   
         mscorwks.dll!Thread::DoADCallBack()  - 0x1411f3 bytes   
         mscorwks.dll!Thread::ShouldChangeAbortToUnload()  - 0x14036b bytes   
         mscorwks.dll!Thread::ShouldChangeAbortToUnload()  - 0x140445 bytes   
         mscorwks.dll!ManagedThreadBase_NoADTransition()  + 0x32 bytes   
         mscorwks.dll!ManagedThreadBase::FinalizerBase()  + 0xd bytes   
         mscorwks.dll!WKS::GCHeap::FinalizerThreadStart()  + 0xa9 bytes   
         mscorwks.dll!Thread::intermediateThreadProc()  + 0x46 bytes   
         kernel32.dll!GetModuleFileNameA()  + 0x1ba bytes   


    I took several snapshots of the finalizers stack when everything is working. The only thing that's missing from the stack when everything is working are the ole32/rpcrt4-frames, but this could be coincidence. Or could this be some kind of com issue I'm having here?




    Friday, August 7, 2009 11:31 AM
  • It is cleaning up the RCW for an out-of-process COM component.  Excellent candidate for timeouts of course.  More evidence for your data provider being the problem.

    Hans Passant.
    Friday, August 7, 2009 11:45 AM
    Moderator
  • We're using devart which is 100%-managed code (at least that's what the vendor claims here: http://www.devart.com/dotconnect/oracle/). Besides I couldn't reproduce the issue when I isolated the db stuff. Why all that hate for oracle? ;)
    Friday, August 7, 2009 12:38 PM
  • Stefan, when looking at the unmanaged stack trace can you try the kb command which also gives you the arguments being passed to each method. The first argument passed to ntdll!NtWaitForSingleObject is the handle id of the wait handle which can be of one of the following types:

    Event
    Section
    File
    Port
    Directory
    Mutant    
    WindowStation
    Semaphore    
    Key            
    Process      
    Thread       
    Desktop     
    IoCompletion
    KeyedEvent  

    you can use the following command to get more info about the handle that the thread is waiting on: !handle 000000c0 f
    the highlighted number is the handle id (ive given some sample output below). This will give you a better idea of what the thread is waiting for:

    !handle 000000c0 f
    Handle 000000c0
      Type         Thread
      Attributes   0
      GrantedAccess 0x1f03ff:
             Delete,ReadControl,WriteDac,WriteOwner,Synch
             Terminate,Suspend,Alert,GetContext,SetContext,SetInfo,QueryInfo,SetToken,Impersonate,DirectImpersonate
      HandleCount   8
      PointerCount 11
      Name         <none>
      Object specific information
        Thread Id   15f0.1370
        Priority    10
        Base Priority 0


    Wednesday, September 2, 2009 2:47 PM