locked
TcpClient receive delay of over 500ms

    Question

  • Hi

    I have an application which is performing poorly receiving data over TCP using the .Net TcpClient class.

    I connect the tcp client and set it to receive using TcpClient.GetStream.BeginRead and EndRead, passing in a 2000 byte buffer.
    EndRead returns with a return value of N bytes in the buffer as expected.

    However, this application is monitoring a phone system in real time, and there is a significant delay (>500ms) between the event and EndRead returning. I've used wireshark to monitor the data, and it is actually arriving on the wire 'on time', but is not passed to the application for a further 500ms. To make things worse, if a further packet arrives, say 200ms after the first packet, this is appended to the original buffer (as you would expect), but there is then a further 500ms delay before EndRead returns. So in that example, I could wait 700ms. This can be compounded upto several seconds of delay if data keeps drip feeding in every few hundreed ms, each time reseting the timeout before EndRead returns (or until the buffer is full)!

    I suspect some of this delay may be related to the NoDelay setting (aka Nagal algorithm). I can see that my app does not ACK each packet for 200ms, and therefore susequent data is not sent from the tcp server end until it receives my ACK. This is exagerating (but not causing) the issue above, as it directly causes further delays in the original notification.

    I have tried setting this to true, but I suspect this only affects the send end, in that it causes a socket to not wait for an ACK. It does not seem to cause the TcpClient to send ACKs immediately.

    So - any ideas how I can stop whatever it is that is waiting 500ms once data has arrived, and also if it is possible to set NoDelay for ACK's? I have no control of the other end of this connection, so cannot disable Nagal algorithm on that end.

    Thanks!

    Adam
    Thursday, July 02, 2009 3:50 PM

Answers

  • Generally, most modern TCP/IP stacks set the PSH bit at the end of the buffer supplied to send(). Many embedded stacks do not do this.

    More detailed information on the problem is in the Microsoft TCP/IP Implementation Details:
      http://technet.microsoft.com/en-us/library/cc758517(WS.10).aspx (under Push Bit Interpretation)
    which gives a registry entry that can be set to always treat packets as though the PSH bit was set:
      http://technet.microsoft.com/en-us/library/cc781532(WS.10).aspx (under IgnorePushBitOnReceives)

    However, this has the side effect of changing the way packets are received for the entire system.

    I still recommend what I suggested in my first post: reduce the buffer size. If the vendor won't set the PSH bit, and the amount of data is variable, I would actually read 1 byte at a time from the socket unless I knew additional bytes had already arrived (Socket.Available). This is less efficient than reading larger buffers because each read is a call into the kernel; however, this would eliminate the 500ms delay, so you should see faster communications at the expense of some CPU cycles. On a modern machine the added overhead would not be noticeable.

            -Steve
    Programming blog: http://nitoprograms.blogspot.com/
      Including my TCP/IP .NET Sockets FAQ
    MSBuild user? Try out the DynamicExecute task in the MSBuild Extension Pack source; it's currently in Beta so get your comments in!
    • Marked as answer by Max Wild Tuesday, July 14, 2009 8:46 AM
    Tuesday, July 07, 2009 5:11 PM

All replies

  • I suspect you have an issue with the Nagel setting, you will want to configure NODELAY so that no resources at the stack level are not re-allocated over and over.  Remember too that VOIP uses QOS of 1 which is high and has to be that way to process phone traffic.  There can't be delays in digitized voice because it won't sound right.  http://en.wikipedia.org/wiki/Quality_of_service  Now one other thing, it's possible but not probable that the stack in your machine cannot handle the load.  There are RTP (Real time protocols) supported on MS platforms that if you can't get this to run right you may want to look into. http://en.wikipedia.org/wiki/Real-time_Transport_Protocol  RTP was built exclusively for voice and media over the digital wire.


    Thursday, July 02, 2009 11:01 PM
  • Thanks for your response.

    I think we have some confusion here. The Nagel setting applies to TCP which is what I'm talking about (not UDP/RTP which you refer to), and I agree as I said in my original posting that this is part of the issue - I have set NoDelay = True, but that does not seem to be preventing the 200ms delay in sending an ACK, my belief being it doesnt change the delay on ACKs, it changes the send to not wait for ACKs to previous packets, which is irrelevant in my case as I'm not sending, I'm receiving!

    I am not dealing with RTP/VOIP which would be over UDP (and therefore not subject to Nagel). The app deals with relatively low volume TCP - each packet averages around 200 bytes, and there could be 0-10 a second. It just needs to arrive promptly!

    The real problem though is that I can see that data has arrived (in wireshark) almost exactly 500ms before .Net signals it has arrived to me. This is what I would really like to understand and is certainly causing the biggest part of the delay.
    Friday, July 03, 2009 8:16 AM
  • Have you tried adjusting your buffer sizes? Reading fewer bytes, for instance.

    When data arrives that partially satisfies a read request, the stack may wait a bit for more data to arrive. I'd be surprised if it's waiting 500ms, though!

    I'd try reading something quite small - say 100 bytes - and see if those reads complete immediately or if you're still seeing a 500ms delay.

           -Steve
    Programming blog: http://nitoprograms.blogspot.com/
      Including my TCP/IP .NET Sockets FAQ
    MSBuild user? Try out the DynamicExecute task in the MSBuild Extension Pack source; it's currently in Beta so get your comments in!
    Friday, July 03, 2009 2:22 PM
  • Max;
      Good points!  I'm wondering how are you determining the delay?  You said wireshark shows the inbounds but you don't see in your application until 200ms later.  How do you know that?  Wireshark shows relative times that are only related to the trace.  So I was wondering how you are determining the delay?  I ask this because based on your knowledge of the stack and Nagel etc. you have the skills to figure this out.  I'm only guessing here, but It's smelling like an application issue and not the stack.  Is your app. purely asynchronous in nature?  And if so, could it be that the callback routines just aren't being called soon enough?  What happens if you make the app. synchronous?
    Friday, July 03, 2009 3:56 PM
  • Stephen - Even if I set the buffer to 100ms and it fills the buffer there is still a 500ms delay before the app sees it.

    AcousticGuiter - Wireshark - View - Time Display Format - Time of day!

    This shows for example, a packet received in wireshark at 17:08:23.357753 (roughly!) and my app debug.Writeline's this same info at 17:08:23.856 the line after I call EndRead. I've tried it sync and async. I've turned on the verbose tracing built into .Net - all show the damn packets arrive 500ms after Wireshark reports them.

    My knowledge of Nagel is in truth googled! However, you are correct about my general .Net networking knowledge - I've been doing TCP and UDP comms for about 9 years, but I've never delved into the TCP latency specifically so I guess this could always have been there. I myself can only assume its in my app - TCP cant be this flawed! But I'm really stuck!

    Thanks
    Friday, July 03, 2009 4:14 PM

  • Based on your descriptions above, and the fact that it doesn't matter for you (Synchronously or Asynchronously), it's beginning to smell stack.  But doesn't that seem starange?  What happens if you create a stand alone paired down, straight send/receive app?  Does it still happen?  Or how about this, try it on a different machine/server.  Because if it is the stack you have no option but to report the issue to MS. 
    Monday, July 06, 2009 2:33 AM
  • I'd love to blame Microsoft, but I cant believe this would not have come up before. I'm still inclined to blame myself and/or the hardware at the other end of the comms!

    With this in mind I have done some more experimenting and found this (from http://en.wikipedia.org/wiki/Transmission_Control_Protocol);

    Forcing Data Delivery

    Normally, TCP waits for the buffer to exceed the maximum segment size before sending any data. This creates serious delays when the two sides of the connection are exchanging short messages and need to receive the response before continuing. For example, the login sequence at the beginning of a session begins with the short message "Login," and the session cannot make any progress until these five characters have been transmitted and the response has been received. This process can be seriously delayed by TCP's normal behavior.

    However, an application can force delivery of segments to the output stream using a push operation provided by TCP to the application layer.[ 2] This operation also causes TCP to set the PSH flag or control bit to ensure that data will be delivered immediately to the application layer by the receiving transport layer.

    In the most extreme cases, for example when a user expects each keystroke to be echoed by the receiving application, the push operation can be used each time a keystroke occurs. More generally, application programs use this function to force output to be sent after writing a character or line of characters. By forcing the data to be sent immediately, delays and wait time are reduced.




    I can see from Wireshark that the PSH flag is not set on the data I receive. When I do a test with my own client/server the flag is set and data arrives immediately. Sadly I cant change the PSH flag on the data so I need to see if I can tell .Net or the underlying winsock socket to ignore PSH and give me the damn data when it arrives!
    • Marked as answer by Max Wild Tuesday, July 07, 2009 3:10 PM
    • Unmarked as answer by Max Wild Tuesday, July 07, 2009 7:05 PM
    Monday, July 06, 2009 2:24 PM
  • Having spoken indirectly to the hardware vendor, they feel their not using the PSH bit is appropriate, as setting it would cause their hardware to send more frequently which they don't deam exceptable and are not willing to change. Therefore I'm stuffed.
    Tuesday, July 07, 2009 3:10 PM
  • Gosh, don't know what to tell you... Sounds like there's no way around this, but I still feel (without proof) that there has to be something in the application.  Does the application use threads?  Keep in mind that a background thead can be delayed by the main thread.  Look into the yield keyword, etc.  I found a situation once where I put a debugger onto a main application and found that stopping the main thread also stopped the background threads, very surprizing to me.   Good Luck in finding the root to this one.  One other question are you sure the wireshark timestamps are in-sync with you application layer timestamps?  I didn't realize that wireshark does that.
    Tuesday, July 07, 2009 4:44 PM
  • Generally, most modern TCP/IP stacks set the PSH bit at the end of the buffer supplied to send(). Many embedded stacks do not do this.

    More detailed information on the problem is in the Microsoft TCP/IP Implementation Details:
      http://technet.microsoft.com/en-us/library/cc758517(WS.10).aspx (under Push Bit Interpretation)
    which gives a registry entry that can be set to always treat packets as though the PSH bit was set:
      http://technet.microsoft.com/en-us/library/cc781532(WS.10).aspx (under IgnorePushBitOnReceives)

    However, this has the side effect of changing the way packets are received for the entire system.

    I still recommend what I suggested in my first post: reduce the buffer size. If the vendor won't set the PSH bit, and the amount of data is variable, I would actually read 1 byte at a time from the socket unless I knew additional bytes had already arrived (Socket.Available). This is less efficient than reading larger buffers because each read is a call into the kernel; however, this would eliminate the 500ms delay, so you should see faster communications at the expense of some CPU cycles. On a modern machine the added overhead would not be noticeable.

            -Steve
    Programming blog: http://nitoprograms.blogspot.com/
      Including my TCP/IP .NET Sockets FAQ
    MSBuild user? Try out the DynamicExecute task in the MSBuild Extension Pack source; it's currently in Beta so get your comments in!
    • Marked as answer by Max Wild Tuesday, July 14, 2009 8:46 AM
    Tuesday, July 07, 2009 5:11 PM
  • Gosh, don't know what to tell you... Sounds like there's no way around this, but I still feel (without proof) that there has to be something in the application.  Does the application use threads?  Keep in mind that a background thead can be delayed by the main thread.  Look into the yield keyword, etc.  I found a situation once where I put a debugger onto a main application and found that stopping the main thread also stopped the background threads, very surprizing to me.   Good Luck in finding the root to this one.  One other question are you sure the wireshark timestamps are in-sync with you application layer timestamps?  I didn't realize that wireshark does that.
    I have done a massive amount of testing and I can definately say the 500ms is real however I read from the socket. I even made an application which relayed the exact data received from the server on to another client using the same code to receive from the hardware I'm talking to, and the other client. E.g.;

    Hardware TCP server -> Test app TCP client with my own tcp server -> Test app TCP client

    This is where I picked up the difference between the packets and the lack of the PSH flag. I'm 100% confident this is the cause of the delay.
    Tuesday, July 07, 2009 7:09 PM
  • Generally, most modern TCP/IP stacks set the PSH bit at the end of the buffer supplied to send(). Many embedded stacks do not do this.

    More detailed information on the problem is in the Microsoft TCP/IP Implementation Details:
      http://technet.microsoft.com/en-us/library/cc758517(WS.10).aspx  (under Push Bit Interpretation)
    which gives a registry entry that can be set to always treat packets as though the PSH bit was set:
      http://technet.microsoft.com/en-us/library/cc781532(WS.10).aspx  (under IgnorePushBitOnReceives)

    However, this has the side effect of changing the way packets are received for the entire system.

    I still recommend what I suggested in my first post: reduce the buffer size. If the vendor won't set the PSH bit, and the amount of data is variable, I would actually read 1 byte at a time from the socket unless I knew additional bytes had already arrived (Socket.Available). This is less efficient than reading larger buffers because each read is a call into the kernel; however, this would eliminate the 500ms delay, so you should see faster communications at the expense of some CPU cycles. On a modern machine the added overhead would not be noticeable.

            -Steve
    Programming blog: http://nitoprograms.blogspot.com/
      Including my TCP/IP .NET Sockets FAQ
    MSBuild user? Try out the DynamicExecute task in the MSBuild Extension Pack source ; it's currently in Beta so get your comments in!
    This is very intersting info and just what I was looking for. I'll certainly try the reg settings tomorrow and look into reducing the buffer to 1 byte. My temporary fix has been to reduce the buffer from 1500 to 250 bytes but I can see how 1 byte buffer would work - I was worried about the efficiency though. I'll do some tests.
    Tuesday, July 07, 2009 7:10 PM
  • By golly Max, I think Stephan hit the nail.  I'm sorry if I misled you on this...didn't mean to, but then again I never thought about stack manipulation and didn't realize the PSH (in your case) caused a delay...  Thanks anyway I learned a lot in this post.
    Tuesday, July 07, 2009 11:19 PM
  • Interesting results. The registry setting of course makes the problem go away.

    I thought I had the ultimate solution though - I know that the data has the length of each packet in bytes 2 to 5, so I was going to receive 5 bytes, find the length, and then receive length-5 bytes, figuring this would result in full buffers as soon as the data is available.... however this still results in a 500ms delay before the initial 5 bytes are returned. The subsequent length-5 bytes are then returned instantly.

    So then I tried Stephan's suggestion of 1 byte - still worrying about performance. And again - 500ms delay before the first byte comes through for each message, followed by instant notification for the remaining bytes, so it would seem that the documention which states that the recv buffer is full causes the receive to return doesnt seem to be true!!

    So I'm left with the registry setting which I'm not feeling very good about but may be the only option.
    Wednesday, July 08, 2009 8:36 AM
  • The delay for a single-byte receive sounds like a bug, actually. It is possible; the TCP/IP stack was heavily modified (possibly completely rewritten) for Vista/2K8, and it's possible that this bug was introduced.

    You might want to post it on Microsoft Connect and see what kind of response you get. Or (if you have an MSDN subscription) open a support incident.

           -Steve
    Programming blog: http://nitoprograms.blogspot.com/
      Including my TCP/IP .NET Sockets FAQ
    MSBuild user? Try out the DynamicExecute task in the MSBuild Extension Pack source; it's currently in Beta so get your comments in!
    Wednesday, July 08, 2009 1:17 PM