Answered by:
WINCE 6 not honoring setsockopt to disable Nagle algorithm (TCP_NODELAY)

Question
-
Wednesday, October 2, 2019 7:07 PM
Answers
-
I am pleased to announce for any other sorry soul looking into this, that there is in fact an issue in CE 6.0 R3 that results in the TCP_NO_DELAY directive not being honored.
After reluctantly opening up a support ticket for 500 bucks, I couldnt have been happier with the outcome, since in less than an hour I was provided with the workaround for this 'bug'.
So the punchline.... update 'EnhanceTCPSend' registry to be 0, then disable the nagle algorithm:
int retValEnhanceTcpSendDisable = RegSetValueEx(HKEY_LOCAL_MACHINE, _T("EnhanceTCPSend"), 0, REG_SZ, (LPBYTE)regVal, _tcslen(regVal) * sizeof(TCHAR)); (where regVal = 0)
setsockopt(pEthernetSocket->Socket, IPPROTO_TCP, TCP_NODELAY, (void*)&tcpNoDelayVal, sizeof(tcpNoDelayVal)); (where tcpNoDelayVal =1)
Since I still cannot post links here.... Here is the relevant content from the documentation I was pointed to:
Component: SQLCE
…
· 090821_KB973144- Windows Embedded CE 6.0 TCP Send/Receive throughput optimization.
A new thread is introduced to process TCP send packets which runs at same priority as NDIS thread. For this solution to work NDIS and the new thread should be at same priority in your miniport driver.
If it needs to be changed, it should be done by changing it thru existing registry setting, so that both threads can be at same priority.
Existing registry for NDIS thread priority:
[HKEY_LOCAL_MACHINE\Drivers\
BuiltIn\NDIS] "Priority256"=dword:74 // default priority is 116 (hex 74) New registries are being introduced. Enhanced TCP is on by default. Following registry key turns it off:
[HKEY_LOCAL_MACHINE\Comm\
Tcpip\Parms] "EnhanceTCPSend"=dword:0 // Turn off enhanced TCP send For each platform these two registries need to be tuned. With proper tuning of these two parameters, maximum TCP send throughput can be achieved.. Start with default value. Increase the buffer size and decrease the wakeup interval.
[HKEY_LOCAL_MACHINE\Comm\
Tcpip\Parms] "SendThreadWakeupInterval"= dword:4 //Send thread wakeup interval. This interval is used for send thread to sleep before it send out the packets. It helps to accumulate the packets and send them in one shot. Default is 4 ms. Maximum value is 10ms and minimum is 1ms.
[HKEY_LOCAL_MACHINE\Comm\AFD] "TCPSendQuota"= dword:10000 // 64K by default. This is the maximum buffer AFD can use to send packets. Max is 512K and Min is 16K
TCP Receive registry setting. The new TCP receive code attempts for a "short path" from ARP to TCP receive. It cuts lots of instructions in the path. By default it is turned on.
Use following registry to turn off this feature:
[HKEY_LOCAL_MACHINE\Comm\
Tcpip\Parms] "AttemptFastPath"=dword:0 // Turn off fast path - Marked as answer by Michel VerhagenMVP Friday, November 8, 2019 8:09 PM
Friday, November 8, 2019 4:10 PM
All replies
-
Posting for behlin28
Windows CE Version 6.0
Setsockopt to disable Nagle Algorithm works on WinCE7, but not WinCE6 (code snippet below)
This code on WinCE6 does not return an error (aka we never print to DebugLogLib)
However, using some tools, I’ve confirmed that in fact we are still suffering from the Nagle algorithm: https://en.wikipedia.org/wiki/Nagle%27s_algorithm
This picture shows that we send out the second frame before rx’ing ack for the first, which results in Nagle alg kicking in, and buffers up data for default of 200ms, then finally sends it
- Edited by Mike Tallroth Wednesday, October 2, 2019 7:43 PM
Wednesday, October 2, 2019 7:29 PM -
From https://tangentsoft.net/wskfaq/intermediate.html#disable-nagle:
3.17 - When should I turn off the Nagle algorithm?
Almost never.
Inexperienced Winsockers usually try disabling the Nagle algorithm when they are trying to impose some kind of packet scheme on a TCP data stream. That is, they want to be able to send, say, two packets, one 40 bytes and the other 60, and have the receiver get a 40-byte packet followed by a separate 60-byte packet. (With the Nagle algorithm enabled, TCP will often coalesce these two packets into a single 100 byte packet.) Unfortunately, this is futile, for the following reasons:
-
Even if the sender manages to send its packets individually, the receiving TCP/IP stack may still coalesce the received packets into a single packet. This can happen any time the sender can send data faster than the receiver can deal with it.
-
Winsock Layered Service Providers (LSPs) may coalesce or fragment stream data, especially LSPs that modify the data as it passes.
-
Turning off the Nagle algorithm in a client program will not affect the way that the server sends packets, and vice versa.
-
Routers and other intermediaries on the network can fragment packets, and there is no guarantee of “proper” reassembly with stream protocols.
-
If a packet arrives that is larger than the available space in the stack’s buffers, it may fragment a packet, queuing up as many bytes as it has buffer space for and discarding the rest. (The remote peer will resend the remaining data later.)
-
Winsock is not required to give you all the data it has queued on a socket even if your
recv()
call gave Winsock enough buffer space. It may require several calls to get all the data queued on a socket.
Aside from these problems, disabling the Nagle algorithm almost always causes a program’s throughput to degrade. The only time you should disable the algorithm is when some other consideration, such as packet timing, is more important than throughput.
Often, programs that deal with real-time user input will disable the Nagle algorithm to achieve the snappiest possible response, at the expense of network bandwidth. Two examples are X Window servers and multiplayer network games. In these cases, it is more important that there be as little delay between packets as possible than it is to conserve network bandwidth.
For more on this topic, see the Lame List and the FAQ article How to Use TCP Effectively.
Good luck,
Michel Verhagen, eMVP
Check out my blog: https://guruce.com/blog
GuruCE
Microsoft Embedded Partner
NXP Proven Partner
https://guruce.com
Consultancy, training and development services.Interested in WEC on i.MX6?
Get the only 100% stable and best performing i.MX6 BSP for WEC7 and WEC2013 here: https://guruce.com/imx6Wednesday, October 2, 2019 7:56 PM -
-
Existing answer here as well https://social.msdn.microsoft.com/forums/en-US/70407ee0-a8bc-4486-9ceb-02dbead58073/wince-6-socket-latencyWednesday, October 2, 2019 7:57 PM
-
Thank you for the quick replies:
@Michel: So this application is similar to the ones in your last paragraph - it is very much concerned with 'real-time' data transfer, and not concered with thruput - its all small packets, being sent in both directions over the sockets.
Due to being in both directions, I need to disable on both ends - our DLL side conforms with the TCP_NODELAY directive, however, the embedded side (CE6) does not.
The reason I am finally posting is because I have resolved this problem with the embedded side if we are running CE7, but the same exact code does not produce the HUGE performance improvement on CE6.
I am about to embark on rebuilding CE6, but based on the post linked in the post above from @IoTGirl (I still cant post links/pics...), it sounds like I'm not the only one having issues with TCP_NODELAY on CE6, so I want to confirm the option actually works on CE6, before spending time rebuilding it and looking for why it may not be working in our build.
Thanks again
- Edited by behlin28 Thursday, October 3, 2019 1:52 PM
Thursday, October 3, 2019 1:10 PM -
@IoTGirl: The OP in the post you linked ends with being able to resolve his situation by only disabling on one end, I need to disable on both ends of my socket connections (one side is a wndows PC) and the other being the embedded device running CE6.
- Edited by behlin28 Thursday, October 3, 2019 1:52 PM
Thursday, October 3, 2019 1:16 PM -
Are you fully up to date with all updates for CE6?
If you are, and it still doesn't work, this may indeed be a bug. With CE6 out of support, you are out of luck ever getting a fix for that, so how about moving to WEC7?
Would that be an option?
Good luck,
Michel Verhagen, eMVP
Check out my blog: https://guruce.com/blog
GuruCE
Microsoft Embedded Partner
NXP Proven Partner
https://guruce.com
Consultancy, training and development services.Interested in WEC on i.MX6?
Get the only 100% stable and best performing i.MX6 BSP for WEC7 and WEC2013 here: https://guruce.com/imx6Thursday, October 3, 2019 6:04 PM -
I am pleased to announce for any other sorry soul looking into this, that there is in fact an issue in CE 6.0 R3 that results in the TCP_NO_DELAY directive not being honored.
After reluctantly opening up a support ticket for 500 bucks, I couldnt have been happier with the outcome, since in less than an hour I was provided with the workaround for this 'bug'.
So the punchline.... update 'EnhanceTCPSend' registry to be 0, then disable the nagle algorithm:
int retValEnhanceTcpSendDisable = RegSetValueEx(HKEY_LOCAL_MACHINE, _T("EnhanceTCPSend"), 0, REG_SZ, (LPBYTE)regVal, _tcslen(regVal) * sizeof(TCHAR)); (where regVal = 0)
setsockopt(pEthernetSocket->Socket, IPPROTO_TCP, TCP_NODELAY, (void*)&tcpNoDelayVal, sizeof(tcpNoDelayVal)); (where tcpNoDelayVal =1)
Since I still cannot post links here.... Here is the relevant content from the documentation I was pointed to:
Component: SQLCE
…
· 090821_KB973144- Windows Embedded CE 6.0 TCP Send/Receive throughput optimization.
A new thread is introduced to process TCP send packets which runs at same priority as NDIS thread. For this solution to work NDIS and the new thread should be at same priority in your miniport driver.
If it needs to be changed, it should be done by changing it thru existing registry setting, so that both threads can be at same priority.
Existing registry for NDIS thread priority:
[HKEY_LOCAL_MACHINE\Drivers\
BuiltIn\NDIS] "Priority256"=dword:74 // default priority is 116 (hex 74) New registries are being introduced. Enhanced TCP is on by default. Following registry key turns it off:
[HKEY_LOCAL_MACHINE\Comm\
Tcpip\Parms] "EnhanceTCPSend"=dword:0 // Turn off enhanced TCP send For each platform these two registries need to be tuned. With proper tuning of these two parameters, maximum TCP send throughput can be achieved.. Start with default value. Increase the buffer size and decrease the wakeup interval.
[HKEY_LOCAL_MACHINE\Comm\
Tcpip\Parms] "SendThreadWakeupInterval"= dword:4 //Send thread wakeup interval. This interval is used for send thread to sleep before it send out the packets. It helps to accumulate the packets and send them in one shot. Default is 4 ms. Maximum value is 10ms and minimum is 1ms.
[HKEY_LOCAL_MACHINE\Comm\AFD] "TCPSendQuota"= dword:10000 // 64K by default. This is the maximum buffer AFD can use to send packets. Max is 512K and Min is 16K
TCP Receive registry setting. The new TCP receive code attempts for a "short path" from ARP to TCP receive. It cuts lots of instructions in the path. By default it is turned on.
Use following registry to turn off this feature:
[HKEY_LOCAL_MACHINE\Comm\
Tcpip\Parms] "AttemptFastPath"=dword:0 // Turn off fast path - Marked as answer by Michel VerhagenMVP Friday, November 8, 2019 8:09 PM
Friday, November 8, 2019 4:10 PM -
You are showing true community spirit for sharing the solution here, even after having had to pay to get support from Microsoft, thank you!
PS. This is the link to the CE 6.0 R3 release notes. Strangely enough, update 090821_KB973144 can be found under the SQLCE heading (should've been under the COMM heading if you ask me, but ok).
Good luck,
Michel Verhagen, eMVP
Check out my blog: https://guruce.com/blog
GuruCE
Microsoft Embedded Partner
NXP Proven Partner
https://guruce.com
Consultancy, training and development services.Interested in WEC on i.MX6?
Get the only 100% stable and best performing i.MX6 BSP for WEC7 and WEC2013 here: https://guruce.com/imx6
- Edited by Michel VerhagenMVP Friday, November 8, 2019 8:17 PM
Friday, November 8, 2019 8:10 PM