none
Deadlock in a multidomain application. GC suspected RRS feed

  • Question

  • Hi, 

    I''ll try to explain our problem we experience with one of our application.

    Our application is a multi AppDomain based Windows Service using the framework .Net 4. This is an application with only managed code libraries. The database provider for our application is npgsql (Postgres). Our architecture is based on a Controller (Main thread) which runs multiple jobs (each job is in his own application domain). Theses jobs are doing lot of stuff but mostly using npgsql to communicate with our database.

    This service is running continuously on multiple computers (Windows 7 64bits or Windows Server 2003 / 2008).

    The problem : Sometimes the application blocks and our threads are not running anymore. This happens after some days,weeks or months or even never for some installations.

    I used Process Explorer to see what's happen and it seems that one thread (17) is running continuously (CPU core ~90%) and the managed callstack (got using WinDbg) is always :

    Child SP         IP               Call Site
    0000000003ffe8a8 000007fef89c4efe [PrestubMethodFrame: 0000000003ffe8a8] System.Net.ContextAwareResult.Complete(IntPtr)
    0000000003ffe910 000007ff02372045 System.Net.Sockets.Socket.ConnectCallback()
    0000000003ffe990 000007ff02371edd System.Net.Sockets.Socket.RegisteredWaitCallback(System.Object, Boolean)
    0000000003ffea10 000007fef7ef9fdc System.Threading._ThreadPoolWaitOrTimerCallback.PerformWaitOrTimerCallback(System.Object, Boolean)
    0000000003ffec98 000007fef89e44c4 [GCFrame: 0000000003ffec98] 
    0000000003ffee70 000007fef89e44c4 [DebuggerU2MCatchHandlerFrame: 0000000003ffee70] 
    0000000003fff048 000007fef89e44c4 [ContextTransitionFrame: 0000000003fff048] 
    0000000003fff230 000007fef89e44c4 [DebuggerU2MCatchHandlerFrame: 0000000003fff230] 

    All there others threads are blocking on Garbage collector. 

    0:017> !threads
    ThreadCount:      30
    UnstartedThread:  0
    BackgroundThread: 11
    PendingThread:    0
    DeadThread:       15
    Hosted Runtime:   no
                                               PreEmptive                                                   Lock
           ID  OSID        ThreadOBJ     State GC       GC Alloc Context                  Domain           Count APT Exception
       0    1   930 00000000004e41f0      6020 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 STA
       2    2   940 00000000004ea540      b220 Enabled  0000000010b9d338:0000000010b9f1e8 00000000004dd100     0 MTA (Finalizer)
       5    7   974 000000000193e920      b020 Enabled  0000000010c75788:0000000010c75980 00000000004dd100     0 MTA
       6    8   978 00000000019407b0      1220 Enabled  0000000010b9f2d0:0000000010ba11e8 00000000004dd100     0 Ukn
       7    9   97c 0000000001979ed0   100a220 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 MTA (Threadpool Worker)
       8    a   ad8 0000000001938a80   1000220 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 Ukn (Threadpool Worker)
    XXXX    d       00000000035df4c0     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 Ukn
      10   15   b8c 00000000035e24c0      b020 Enabled  0000000010bdb278:0000000010bdd1e8 000000000538ee00     0 MTA
      11   1c  10b4 00000000055c4a20      b220 Enabled  0000000000000000:0000000000000000 000000000538ee00     1 MTA
    XXXX   22       000000000543ff50     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 MTA
      12    f   2b8 0000000006400dc0   1019220 Enabled  0000000010c412f8:0000000010c43258 00000000004dd100     0 Ukn (Threadpool Worker)
      13   1d   140 000000000542b5b0   1009220 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 MTA (Threadpool Worker)
      14   10  12a8 0000000005440d70   1009220 Enabled  0000000010c01288:0000000010c031e8 00000000004dd100     0 MTA (Threadpool Worker)
      15   12   990 000000000376d220   1009220 Enabled  0000000010bcb288:0000000010bcd1e8 00000000004dd100     0 MTA (Threadpool Worker)
      16    5   9f4 000000000376d930   1009220 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 MTA (Threadpool Worker)
      17   21   e70 0000000005469770   8009222 Disabled 0000000010c4d2e0:0000000010c4f258 0000000005ef1660     0 MTA (Threadpool Completion Port)
    XXXX   19       0000000006048940     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 Ukn
    XXXX   14       00000000064f6b90     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 Ukn
    XXXX   17       00000000035e32e0     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 Ukn
    XXXX    3       00000000062ca410     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 Ukn
    XXXX    b       00000000062c9d00     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 Ukn
    XXXX   18       00000000036d5250     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 MTA
    XXXX    6       00000000036d6070     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 Ukn
    XXXX   24       00000000036d5960     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 Ukn
    XXXX    c       00000000063460c0     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 MTA
    XXXX   1f       00000000062cab20     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 MTA
    XXXX   1e       00000000060a06f0     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 Ukn
    XXXX   13       00000000062c95f0     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 MTA
    XXXX    e       00000000063467d0     19820 Enabled  0000000000000000:0000000000000000 00000000004dd100     0 Ukn
      19   1a  1078 0000000006346ee0      b020 Disabled 0000000010c81a60:0000000010c81a70 0000000005ef1660     3 MTA (GC)


    The highlighted line is the thread which is running continusly. The thread 19 is one of our thread which requesting a Gargabe Collector run . 


    For me, the garbage collector is waiting that the thread 17 (Managed Thread ) is suspended to be able to collect objects and release them. Why this thread is not going in suspended state like the other ones ? Because PreEmptiveGC is Disabled. Right ?

    So, why a managed thread can stay always in preEmptive GC Disabled continuously ???? 

    I'm suspecting npgsql library because this is the only one progam code that uses asynchronous socket. But I don't understand what could be the problem with async sockets and deadlock affecting GC.... 

    Please, tell me if i'm right and what I missed ?

    Your help is really appreciated !!!!!!!

    David


    • Edited by Odotech Inc Friday, December 20, 2013 10:59 PM
    Friday, December 20, 2013 3:44 PM

Answers

All replies

  • Hi David,

    I think there should be a lock owned by thread which causes this issue. It’s difficult to troubleshoot it. So what we can do is to let you know the important information about troubleshoot it. Please refer to the following link whose author is familiar with this kind of issue. http://blogs.msdn.com/b/tess/archive/2008/02/11/hang-caused-by-gc-xml-deadlock.aspx.

    Regards,


    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click HERE to participate the survey.

    Monday, December 23, 2013 5:01 AM
    Moderator
  • Hi,

    Thanks for the link. I already read this Tess's article.

    The fact that the callstack of the thread (17) that blocks the GC (pre emptive disabled), doesn't show any call to our libraires, is blocking us. The callstak is stuck in the socket.ConnectCallback function, so in the CLR lib. I just changed the only one third-party library that uses asynchronous sockect connection to use synchronous implementation. Tests are runnng now... could take 1 or 2 weeks before our application freezes.

    Tx

    David

    Tuesday, January 7, 2014 2:32 PM