locked
Bug in Silly Window Syndrome Mitigation in Windows 2008 R2 SP1 64 bit? RRS feed

  • Question

  • Has anybody seen this before?  I have a .NET 2.0 application ("!-client" below) running on Windows Server 2008 R2 SP1 64-bit that is calling an external webservice ("webservices" below) that hangs.  It appears that the Silly Window Syndrome prevention is kicking in - i.e. the server is sending back data to the client faster than it can process it - so the client asks it to slow down.  However, there's a bizarre 36 second interval in frame 1297.  We see these huge jumps in the TCP window packets and at that point, the server stops responding.

    Stranger still, we don't see this when we run the client as a 32-bit application.  When we trace it when it works, we don't see these big intervals as it opens the TCP Window back up.  In addition, we've run this on Amazon, Azure and local machines so it's not related to where the client is.  Maybe a bug in Windows Server 2008 R2 SP1 Silly Window Syndrome prevention?

    Any ideas would be greatly appreciated!

    Screenshot of trace snippet (TCP Troubleshooting columns):

    Monday, July 22, 2013 6:30 PM

All replies

  • Hi Rob

    I looks to me like the application on !-client cannot drain the tcp buffer (!-client sends RWIN of 0 in #972,981, 1012) until #1297 (now RWIN 3472). At that point I would expect websrv to start sending data again but doesn't. !-Clients waits 5 minutes for data then gives up (FIN in #2711).

    I would probably look at why the application is taking 36 seconds to the drain the tcp buffer next.

    -Wes

    Thursday, August 1, 2013 3:21 PM
  • Thanks Wes.  My thoughts exactly.  Unfortunately, all I'm doing is waiting on a Web service call to return so its below my code.  I just do an "Add Web Reference" to the Web service, and then I make a call.  It's when the SoapHttpClientProtocol derived class that was "auto-generated by Microsoft.VSDesigner, Version 4.0.30319.296" is reading the response from the server.  It always reads about 72K of the returned XML and then the above happens.

    I had the same thought, so I thought maybe if I call the Web service call asynchronously instead of synchronously, somehow underneath it'll read the response better.  The exact same Web service call seems to work fine if I use the Async method instead of the sync method.

    So, it works either when:

       1. Client is run as a 32-bit .NET 2.0 EXE.

       2. Web service calls are made async in a 64-bit .NET 2.0 EXE

    It does not work (always fails and you see those big gaps) when:

       1. Client is run as a 64-bit .NET 2.0 EXE and calls are made synchronously.

    Again, it's a very simple test to reproduce. Just create a small Azure or Amazon VM, Windows 2008 Server R2 SP1 64-bit.  Happens every single time.

    Thanks Wes.

    Rob

    Thursday, August 1, 2013 4:11 PM
  • Perhaps you mignt find a clue in a etl trace. I have had some luck with this kind of trace for finding higher level protocol activity. In my case I  used the NetConnection scenario to confirm the app was enabling the keepalive socket option. View the scenarios with cmd: netsh trace show scenarios.

    -Wes

    • Proposed as answer by Paul E Long Wednesday, August 7, 2013 3:37 PM
    Friday, August 2, 2013 5:44 PM