none
CLR Thread Pool in strange state. RRS feed

  • Question

  • Hi we have started seeing a strange issue in some of our applications.

    Randomly some of our client, and service apps stop processing remoting calls.

    On inspecting dumps from these applications we see that the thread pool has been exhausted, and nearly all of the threads are and unstarted/pending state.

     

    Using SOS

     

    !ThreadPool returns:

    CPU utilization: 0%
    Worker Thread: Total: 1023 Running: 1023 Idle: 0 MaxLimit: 1023 MinLimit: 16
    Work Request in Queue: 86
    --------------------------------------
    Number of Timers: 4
    --------------------------------------
    Completion Port Thread:Total: 6 Free: 1 MaxFree: 32 CurrentLimit: 7 MaxLimit: 1000 MinLimit: 16

    !Threads returns expected threads and lots of unstarted threads:

    ThreadCount:      1067
    UnstartedThread:  1023
    BackgroundThread: 41
    PendingThread:    1023
    DeadThread:       2
    Hosted Runtime:   no
                                       PreEmptive   GC Alloc                Lock
           ID  OSID ThreadOBJ    State GC           Context       Domain   Count APT Exception
       0    1  3044 0000000000469cd8      6020 Enabled  000000001e14d934:000000001e14f6c8 00463aa0     0 STA
       2    2  19ac 00000000004760d8      b220 Enabled  0000000000000000:0000000000000000 00463aa0     0 MTA (Finalizer)

       .......


    XXXX   39  3f00 000000000fc4cb40      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   40  3968 000000000fc4a808      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   45  36c0 0000000007d8f9e0      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   46  4008 0000000007d8f4d8      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   3f  3700 0000000007d8bd80      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   47  418c 0000000007d8dbb0      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   49  4280 0000000007d8d1a0      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   4a  3e44 0000000007d8efd0      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   48  2890 0000000007d8c288      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   30  31a8 000000000fc4bc28      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   4d  3f3c 000000000d2e8d08      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   4e  1fcc 000000000d2e9210      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   4f  1050 000000000d2e9718      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   50  23bc 000000000d2e9c20      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   51  3f70 000000000d2ea128      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   4c  4150 000000000d2ea630      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   52  4058 000000000d2eab38      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   53  1308 000000000d2eb040      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   54  3ba0 000000000d2eb548      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   55  4040 000000000d2eba50      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   56  1bbc 000000000d2ebf58      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   57  1ee8 000000000d2ec460      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   58  112c 0000000007d8e5c0      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   59  1a78 000000000fcd8710      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   5a  3cf8 000000000fcd8c18      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   5b  3cc0 000000000fcd9120      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   5c  1c50 000000000fcd9628      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   5d  4284 000000000fcd9b30      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   5e  2d84 000000000fcda038      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   5f  419c 000000000fcda540      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   60  400c 000000000fcdaa48      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   61  3438 000000000fcdaf50      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   62  3c5c 000000000fcdb458      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   63  3268 000000000fcdb960      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   64  3a78 000000000fcdbe68      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   65  1470 000000000fae4a58      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   66  3588 000000000fae4f60      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   67  3cdc 000000000fae5468      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   68  3504 000000000fae5970      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   69   b90 000000000fae5e78      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   6a  4360 000000000fae6380      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   6b   930 000000000fae6888      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   6c  3e08 000000000fae6d90      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   6d  38a4 000000000fae7298      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   6e  40ac 000000000fae77a0      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   6f  3c8c 000000000fae7ca8      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   70  2114 000000000fae81b0      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   71  2a8c 000000000f85d868      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   72  4394 000000000f85dd70      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   73  430c 000000000f85e278      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn
    XXXX   74  3c04 000000000f85e780      1400 Enabled  0000000000000000:0000000000000000 00463aa0     0 Ukn

    .............................

     

    Can anyone explain what is going on here, and how we might start to debug the problem?

     

     

     

     

    Thursday, July 21, 2011 12:56 PM

All replies

  • Hi ,

    Could you give us one sample code to reproduce this issue? Or can you post the code about how to use Threadpool in that faulty application?

    Which communication mechanism is used between client and server? Whether client or server closes the working object in time.

     

     


    Best regards,
    Riquel

    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Thursday, July 21, 2011 2:35 PM
  • Hi Riquel,

    Unfortunately we do not know where in the code this is happening..

    We don't really use the MS thread pool directly.

    The apps all comminucate using .NET Remoting which is the heaviest user of MS Thread pool, and there is a lot of remoting activity which makes it hard to trap.

    What I'm trying to figure out is how these threads get into this state... They are CLR owned threads which implies that they are thread pool threads, and the nunbers tally up with the thread pool totals...

    It appears that the thread pool gets to a point where it needs to expand the number of threads it has, but instead of starting up the threads stay in the started state..

     

     

    Thursday, July 21, 2011 3:18 PM
  • Yes,  .NET Remoting uses the thread of threadpool to process requests on remote objects. Now you can post the remote object's implementation code to us.

    Second print the call stack of pending threads if there is something. It is important to show the code about how to setup remote object, start server remote object.


    Best regards,
    Riquel

    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Thursday, July 21, 2011 3:52 PM
  • Here is an example of how we call services using .NET remoting

     

    public interface IMyInterface
    {
        void DoServiceCall(string data);
    }


    public class MyService : IMyInterface
    {
        public void DoServiceCall(string data)
        {
            // Do work....
        }
    }

    // Service
    class Program
    {
        public int Main(string[] args)
        {
            MyService service = new MyService;
           
            TCPChannel channel = new TCPChannel();
            ChannelServices.RegisterChannel(channel, false);
           
            RemotingServices.Marshal(service, "MyService");

            Console.ReadLine();
        }
    }

     

    // Client
    class Program
    {
        IMyService service = (IMyService)Activator.GetObject(typeof(IMyService, "tcp://serviceip:port/MyService");
       
        service.DoServiceCall("Hello world");
    }

     

    As for the stack traces, there aren't any as the real threads haven't been created, they are unstarted/pending...

     

     

     

     

    Friday, July 22, 2011 10:29 AM
  • Hi,

     

    Thank you for your question, we're doing research on this case, it might take some time before we get back to you.


    Eric Yang [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Tuesday, July 26, 2011 1:25 AM
  • 1 Please post the code inside DoServiceCall function.

     public void DoServiceCall(string data)
    {
    // Do work....
    }

    2 When does the server application hang?  A after server application starts, it hangs right now. B It receives many client calls, then it hangs. Let us know it is A, B or other condition. Thanks!


    Best regards,
    Riquel

    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Tuesday, July 26, 2011 3:35 PM
  • Hi,

    Could you pleae post more information like what Riquel request to help the reserach?

     

    Thanks

    Wednesday, August 10, 2011 2:10 AM
  • DoServiceCall is merely an example, the code inside the service calls could be database calls, it could be device interactions, lots of different scenarios, there isn't a specific piece of code that we can find that triggers this.

    What we do know is that the services thread pool becomes depleated after about 12 days 21 hours of running, the process continues to execute, but with no threads in its pool, so clinet start failing because their calls are not succeeding...

    Regards

    pcol

     

    Wednesday, August 10, 2011 9:32 AM
  • Not got the direct answers with the current information, also you could  visit the below link to see the various paid support options that are available to better meet your needs if you requires a more in-depth level of support.


    http://support.microsoft.com/default.aspx?id=fh;en-us;offerprophone 

    Regards 

    Thursday, November 10, 2011 3:32 AM
  • We have experienced the same issue. In our scenario we have a .NET 4.0 client (windows forms) application. This application periodically (every 60 seconds) checks the WMI subsystem to get disk available space. We used a ManagementObject and query. We created an object every time and disposed it but were not calling Dispose on that object before nulling it. After 15 days of operation whether busy or not, we would fail with an out of memory call during the setup phase (thread start that starts behind the scenes to do the WMI call).

    System.OutOfMemoryException: Retrieving the COM class factory for component with CLSID {4590F811-1D3A-11D0-891F-00AA004B2E24} failed due to the following error: 8007000e Not enough storage is available to complete this operation. (Exception from HRESULT: 0x8007000E (E_OUTOFMEMORY)).
       at System.Management.ThreadDispatch.Start()
       at System.Management.ManagementScope.Initialize()
       at System.Management.ManagementObject.Initialize(Boolean getObject)
       at System.Management.ManagementObject.Get()
       ...

    Crash dump analysis showed ample heap space and no exhausted stacks but did show 1023 workers

    CPU utilization: 9%
    Worker Thread: Total: 1023 Running: 1023 Idle: 0 MaxLimit: 1023 MinLimit: 2
    Work Request in Queue: 9740
    --------------------------------------
    Number of Timers: 11
    --------------------------------------
    Completion Port Thread:Total: 104 Free: 1 MaxFree: 4 CurrentLimit: 62 MaxLimit: 1000 MinLimit: 2

    We added a dispose to (a 'using' around our use of the management object) and reduced the call to every 5 minutes. We are testing now but am not sure yet if this is the fix. The fact that the thread pool goes out of sorts makes me think we have a stuck timer or something has gone awry at the CLR level (some subtle bug). It is still not clear to me if the thread pool exhaustion is a symptom or a cause, but the fail in ThreadStart makes me think it might be a cause.

    Thursday, December 8, 2011 8:17 PM