none
Process "hangs" for 6-8 seconds RRS feed

  • Question

  • I'm running in to some weird issues with a custom caching system I've built. I have 5 computers running a test client, hitting my server (across WCF) with about 40 threads each. For the full test they make 1000 calls per thread. The call they are making isn't overly complex -- it grabs small amounts of data from about 8-10 large dictionaries then combines the data into result rows and returns a max of 30 rows. The complete round trip time of the call (when called singly) is about .3 seconds.

    With 5 clients (200 threads total) hitting the server at once, the 8 CPUs sit at about 50% usage each. The response times have grown anywhere from 2-4 seconds for each client. Not really great, but I'll deal for now.

    After about 30 seconds or so, all 8 CPUs drop to 0% and stay there for about 8-10 seconds. This obviously causes the response times to shoot way up (around 15 seconds or so; depends on how long it's hung). The CPUs then pick back up and everything resumes as usual. This doesn't just happen once near the beginning. It happens multiple times during the test with no certain frequency.

    Something else that is puzzling is that after one of these big "hangs", the base line processor usage jumps to about 65%. When that happens, response times start jumping all over the place. I'll see some that are 2 seconds, others that are closer to 20 seconds!

    After some time, the baseline usage will come back down closer to 40-50%, where response times level back out around 2-4 seconds. Then it all happens again.

    I posted this same thing on my blog. It includes screenshots of the server's processor usage. http://electronihack.blogspot.com/2010/06/process-hangs-for-6-8-seconds.html

    Host machine: Windows Server 2008 SP1 x64; Xeon W5590 @ 3.33GHz; 24GB RAM; 2 SSDs in RAID 0;

    Here's another weird issue I'm seeing that may or may not be related to this. It's posted in the WCF forum.

    Thanks!

    • Edited by TheNick Thursday, June 10, 2010 8:08 PM
    Thursday, June 10, 2010 5:15 PM

Answers

  • TheNick,

    It appears that all the reads are waiting for something. Callstack on screenshots is not complete (there are no symbols for [External Code], so it is hard to tell what is really going on there.

    Using WinDbg for analyzing mem dumps is recommened, because you can tell what object all these threads are waiting on.  And what is really going on.

     


    Vadym Stetsiak. http://vadmyst.blogspot.com
    • Marked as answer by SamAgain Friday, June 18, 2010 8:06 AM
    Monday, June 14, 2010 8:08 AM

All replies

  • If your clients are running XP, then it seems possible that they are just running out of ephemeral ports (Vista and up use roughly four times as many ports, so the problem is far less likely to happen). It would explain not only the apparent freezing of the server, but also why the time of each request would vary so wildly afterwards (the clients would start "stuttering" as ports become available again 120 seconds after being closed).

    A simple check to see if this might be the issue would be to get the server to "stall" and then start yet another client. If the problem is due to the clients running out of ports, the server should be back immediately responding to the requests of the sixth client.

    HTH
    --mc

    Thursday, June 10, 2010 6:50 PM
  • Thanks Mario. The clients are all on Vista. I'll go ahead and give it a try now.
    Thursday, June 10, 2010 7:33 PM
  • So I took 4 clients and got the server "stalled". Then I fired up another client on another Vista machine. This client was seeing the same slow response times as the others from the very beginning. It didn't appear to put too much more load on the server however.

    Thursday, June 10, 2010 7:42 PM
  • I just posted this same issue on my blog. I included some screenshots of the server CPU usage. Hopefully that might help some too.

    http://electronihack.blogspot.com/2010/06/process-hangs-for-6-8-seconds.html

    Thursday, June 10, 2010 8:09 PM
  • A simple check to see if this might be the issue would be to get the server to "stall" and then start yet another client. If the problem is due to the clients running out of ports, the server should be back immediately responding to the requests of the sixth client.

    HTH
    --mc

     

    The server does STALL.  Even if you start another client, it won't connect.  We just can't figure out why.

    Thursday, June 10, 2010 10:49 PM
  • Hi,

       Just as Mario has suggested, we need to figure out whether the cause lies on the server side, client side or both. Based on your experiment with the fifth client machine, it seems we should check the server side. Could there be some competition? Or maybe the client requests are not properly routed or balanced. I am no expert on WCF, I hope the following reference could be relevant.

       1. Building a WCF Router, Part 1

       2. Building A WCF Router, Part 2

     


    Please mark the right answer at right time.
    Thanks,
    Sam

    • Edited by SamAgain Friday, June 11, 2010 6:02 AM refine
    Friday, June 11, 2010 5:56 AM
  • Can you collect a process dump when server is "hanging"?

    To do that under Vist: Task Manager -> Find your server process -> Create Memory Dump.

    After memory dump is created you can investigate it using WinDbg tools and SOS extension. See more here . (search for hang keyword)

    HTH


    Vadym Stetsiak. http://vadmyst.blogspot.com
    Friday, June 11, 2010 2:07 PM
  • Thanks Sam. Those links were definitely an interesting read, but didn't prove to be much help with my current situation. I'm fairly certain we've narrowed it down to a server-side issue though, as you said.
    Friday, June 11, 2010 3:43 PM
  • Thanks Vadym. In the process of doing this now. I'll post back with any results.
    Friday, June 11, 2010 3:43 PM
  • Vadym, I followed another article by Tess to debug the dump in VS2010. WAY nicer than doing it in WinDbg.

    Here's where I ended up. During the hang I have 36 threads.

    31 of them are in ntdll.dll->ZwWaitForMultipleObjects.
    1 is in ntdll.dll->ZwWaitForSingleObject
    1 is in ntdll.dll->ZwRequestWaitReplyPort
    1 is in ntdll.dll->NtDelayExecution

    I took a screenshot of the parallel stacks. Had to split them into two images.
    Image 1
    Image 2

     

    Friday, June 11, 2010 5:16 PM
  • TheNick,

    It appears that all the reads are waiting for something. Callstack on screenshots is not complete (there are no symbols for [External Code], so it is hard to tell what is really going on there.

    Using WinDbg for analyzing mem dumps is recommened, because you can tell what object all these threads are waiting on.  And what is really going on.

     


    Vadym Stetsiak. http://vadmyst.blogspot.com
    • Marked as answer by SamAgain Friday, June 18, 2010 8:06 AM
    Monday, June 14, 2010 8:08 AM
  • We temporarily mark a reply, please remember to click "Mark as Answer" on the post that helps you, and to click "Unmark as Answer" if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
    Please mark the right answer at right time.
    Thanks,
    Sam
    Friday, June 18, 2010 8:07 AM