IOCP fairness question
-
Thursday, May 03, 2012 3:07 PM
Test setup:
- On a Solaris box I have an echoer process (E) which can take a number of connections. On each connection it receives messages and sends them back on the same connection.
- On a Windows 7(64 bit) PC (quad-core) I have a process (LB) which uses IOCP for socket IO. LB takes connections from clients and forwards messages it receives from the client to process E, and forwards response messages from E to the client. The number of connections between LB and E is configurable; traffic between LB and E are evenly distributed over abaliable connections.
- On the same Windows 7 PC I run 400 injectors. Each injector has a connection to process LB over which it sends a message, waits to receive the response (echoed request message) before sending the next message.
Symptoms:
- Process LB keeps growing in size until it runs out of memory and crashes (this happens fairly quickly, in matter of a few minutes). On the LB-E interface, the number of issued writes overwhelmingly outnumbers the number of completed writes reported by IOCP. Pending write messages are only released when their completion is reported; since completion reports lag so behind, LB has to keep whose write messages around and eventually runs out of memory.
- Increasing the number of connections between LB and E appears to mitigate the issue. e.g. when the number of connections between LB and E is increased to 6, write completion is able to keep up and the size of LB remains stable.
Even though IOCP lags behind in reporting write completions, I believe the messages do get sent out and made the round trip back to the injectors. Given the synchronous nature of the injectors, that's the only way more messages can get into the system. Based on that, IOCP apparently reports read completion in a timely fashion on LB-E sockets.
On each LB-E connection, there are at most one pending read and a number of pending writes(one write for each message) at any time. To prevent greedy reads, the read buffer is of the same size of the test message.
Based on my observations, it appears that under heavy traffic IOCP favors read completion over write completion. Can someone with kernel experience confirm this? Are there any way to tune such behavior?
- Edited by Qiu, Wenning Thursday, May 03, 2012 3:35 PM
All Replies
-
Thursday, May 03, 2012 3:53 PM
This is quite normal. If you were to run WireShark on the connections between the LB and the E then you would likely see that the TCP sliding window "fills up" as the E process fails to read data fast enough for the LB process. That is the LB is sending faster than the E can read and you have no flow control in place to slow the LB down when the overlapped writes start to take longer to complete.
I've written in depth about this here: http://www.serverframework.com/asynchronousevents/2011/06/tcp-flow-control-and-asynchronous-writes.html
"Things are a little more complex with an asynchronous sending API such as the API to the Windows Overlapped I/O system. With an asynchronous sending API the sending application is not blocked when the sender's send buffer is full. Instead the API queues the data for sending once the sender's buffer has space, this is, of course, a simplified view of things. The important thing to understand is that the sending application is now no longer throttled to the speed of the receiving application. Nothing in the sender's asynchronous API will prevent the sending application from generating more data than the receiving application can process and so the sending application itself must take responsibility for not sending more than the receiver can receive."
What you should, probably, be doing is monitoring how many overlapped writes you have (per connection) and when this exceeds a configuration value you stop reading from the input side of the load balancer on this connection. This, in turn, will put "back pressure" on the connection into the LB and will (hopefully) cause the sender to slow down (if it doesn't then you've just moved the problem from your LB to the sender).
- Edited by Len Holgate Thursday, May 03, 2012 3:54 PM tidied up fonts
-
Thursday, May 03, 2012 4:35 PM
Thanks Len for your reply. I agree your scenario is legit in general.
However, I believe pace control is in place in my test: the injectors works in synchronous mode; they only send a new message after the previous message is round-triped. So if things are working properly, I should be seeing at most 400 pending messages in the whole system. The fact that more messages are able to get into the system indicates process E is able to keep up with incoming messages and send then out of the door.
I suspected TCP sliding window too and captured TCP trace (see below, port 20204 is process E). I saw no sign the sliding window is filled up.
Your suggestion of monitoring overlapped writes and slowing down on the input side of LB should work for this particular test setup. But my actual situation is more complicated: I use IOCP in a generic IO framework handling networking on behalf applications. So the points where pacing is needed could vary depending on the application architectures, and orchestration of pacing may be needed between components that are independent of each other at design time. I was hoping if fairness can be guaranteed, it would simplify things.
- Edited by Qiu, Wenning Thursday, May 03, 2012 4:36 PM
-
Thursday, May 03, 2012 4:51 PM
What about a trace from LB to the input side? (I assume this is LB to E).
I also have a generic framework and, IMHO, the framework MUST understand what's going on with regards to flow control (or lack of it) and MUST allow you to step in a get some feedback to the input side. How you deal with that input is going to be application specific - for example, in a large game server system that I'm currently working on we sometimes push the flow control notifications back to an input TCP connection (when we're acting as a router for other game server nodes) and sometimes push it right back up into our managed layer so that the game developer gets what is effectively a refusal to write more data at this time because all of our buffers are currently full and we're not configured to allow more data on this connection.
Even if fairness can be guaranteed (and I think it can, though have never seen the code in question) you would still need to handle this problem as a client could stop reading at any point and for any reason and unless you have some safe guards in place then your machine is the one that will crash when you use up all of the memory and/or non-paged pool.
- Edited by Len Holgate Thursday, May 03, 2012 4:58 PM expanded to cover more than just the original trace request
-
Thursday, May 03, 2012 4:58 PMThere is nothing in WireShark. Since the injectors and LB are on the same host(W7-64), I guess the packets never made to the wire.
-
Thursday, May 03, 2012 5:02 PMSplit them onto separate machines or bind them to separate nics on the same machine. Without that trace you're only looking at one half of the potential problem (unless you happen to have perfmon counters in the LB and can see that the writes are all pending on the E side).
-
Thursday, May 03, 2012 5:10 PM
I do have stats(not perfmon). EIP-20108 is the client-facing side of LB; EIPCLI is the E-facing side of LB.
D:\Work\Modules\IofwTestHarness\IofwTestHarness\Setup\Bin>dmpctrl -n loadbal -d |grep Iocp
74 EIP-20108_IocpWrites_EMC0 146037
75 EIP-20108_IocpWriteCompletes_EMC0 146036
87 EIPCLI_IocpWrites_EMC0 145662
88 EIPCLI_IocpWriteCompletes_EMC0 83478D:\Work\Modules\IofwTestHarness\IofwTestHarness\Setup\Bin>dmpctrl -n loadbal -d |grep Iocp
74 EIP-20108_IocpWrites_EMC0 146901
75 EIP-20108_IocpWriteCompletes_EMC0 146901
87 EIPCLI_IocpWrites_EMC0 146519
88 EIPCLI_IocpWriteCompletes_EMC0 83478D:\Work\Modules\IofwTestHarness\IofwTestHarness\Setup\Bin>dmpctrl -n loadbal -d |grep Iocp
74 EIP-20108_IocpWrites_EMC0 147834
75 EIP-20108_IocpWriteCompletes_EMC0 147834
87 EIPCLI_IocpWrites_EMC0 147450
88 EIPCLI_IocpWriteCompletes_EMC0 83478 -
Thursday, May 03, 2012 5:17 PM
That doesn't mean anything to me.
-
Thursday, May 03, 2012 6:14 PMWhat percounts do you expect to see? It is easy for me to rerun the test.
-
Thursday, May 03, 2012 7:43 PM
Ok, I think I've worked out what those numbers mean; handy that you have a separate IOCP for each side - I wouldn't normally do that.
Looks like the write completions have stopped which, to me, indicates that the wireshark log should show your tcp window is full. I don't know of any other reason for write completions to stall like that. Which machine was the log shown above taken on? Do you have a switch between LB and E? Is it a managed switch? Do traces taken on both LB and E concur that you're not seeing zero windows?
I assume you have no third party LSP's or whatever installed.
-
Thursday, May 03, 2012 8:03 PM
Thanks Len.
Yes there is a dedicated IOCP port for each side. I always just have a global IOCP for the whole process, but to understand this issue I need to isolate each side...
What puzzles me is that what is stopped/stallled/slowed is just the reporting of write completion, the writes actually are completed since messages go through E and back to the injectors. So I was wondering if IOCP favors reporting read completion over write completion in certain scenario...
To answer your questions:
- The WireShark trace was taken on LB machine.
- LB and E is connected through LAN, so there is very good chance there are switches (but I am not sure about the type) between them.
- I did not take trace on E since it would require admin privilege on Solaris.
- What is LSP?
- Edited by Qiu, Wenning Thursday, May 03, 2012 8:15 PM
- Edited by Qiu, Wenning Thursday, May 03, 2012 8:15 PM
-
Thursday, May 03, 2012 8:41 PM
LSP - http://en.wikipedia.org/wiki/Layered_Service_Provider - some firwalls and virus software install them to look at network traffic, a buggy one could cause problems, perhaps.
I don't go with your "favoring read over write" as even if there was something that caused this favouritism then I would expect that as soon as your read traffic dropped off your writes would complete - but anyway, I've never seen anything like that before.
-
Thursday, May 03, 2012 8:47 PMYeah, the "favoritism" is just my theory I think can explain what I saw. When I get around with another high-priority issue I'll try to build a barebone test case to reproduce the problem and share it on this thread.
- Edited by Qiu, Wenning Thursday, May 03, 2012 8:48 PM
-
Tuesday, May 15, 2012 3:59 PM
It has been a while, but I do manage to get the barebone test case created to demontrate the issue. The test case can be downloaded here.
Follow these steps to run the test:
- Start Echoer.exe on one machine, specifying listen port.
- Start LoadBalancer.exe on a different machine, specifying listen port, Echoer connection information etc.
- Run the load test from vs2010. I ran it on the same machine as LB. Depending on how you run it the LB_host and LB_port in Injector.cs may need to be adjusted. Load test duration and user count may also need to be adjusted; for my environment I used 20 minutes and 400 users.
The following are the command line I used. It is critical to keep the number of connections between LD and Echoer small; I used 1.
Echoer 9999 LoadBalancer.exe 8888 odcsgslbapp04.devcsg.com 9999 1
When I originally reported the issue the fairness problem occurred in LB when Echoer ran on a Solaris box. With this test case, all components run on Windows boxes and the problem appears to have shifted to Echoer, but I believe they are the same issue. The following it the Echoer output from my test; the stats were printed out every 2 seconds and we can see that the write completion reporting stalled and the Echoer process eventually crashed.
D:\WenningQiu\IofwTestHarness>Echoer 9999
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=0, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 0
==========================================
ECHOER: Reads=1, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 1
==========================================
ECHOER: Reads=1, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 1
==========================================
ECHOER: Reads=1, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 1
==========================================
ECHOER: Reads=1, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 1
==========================================
ECHOER: Reads=1, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 1
==========================================
ECHOER: Reads=1, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 1
==========================================
ECHOER: Reads=1, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 1
==========================================
ECHOER: Reads=1, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 1
==========================================
ECHOER: Reads=1, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 1
==========================================
ECHOER: Reads=1, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 1
==========================================
ECHOER: Reads=1, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 1
==========================================
ECHOER: Reads=1, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 1
==========================================
ECHOER: Reads=1, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 1
==========================================
ECHOER: Reads=1, ReadCompletions=0; Writes=0, WriteCompletions=0
Number of connections: 1
==========================================
ECHOER: Reads=2448, ReadCompletions=2448; Writes=785, WriteCompletions=784
Number of connections: 1
==========================================
ECHOER: Reads=4831, ReadCompletions=4830; Writes=2218, WriteCompletions=1179
Number of connections: 1
==========================================
ECHOER: Reads=7435, ReadCompletions=7434; Writes=3681, WriteCompletions=1179
Number of connections: 1
==========================================
ECHOER: Reads=9380, ReadCompletions=9379; Writes=4546, WriteCompletions=4546
Number of connections: 1
==========================================
ECHOER: Reads=13370, ReadCompletions=13369; Writes=5808, WriteCompletions=5807
Number of connections: 1
==========================================
ECHOER: Reads=16890, ReadCompletions=16889; Writes=7222, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=20419, ReadCompletions=20418; Writes=8515, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=57192, ReadCompletions=57191; Writes=24214, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=61244, ReadCompletions=61243; Writes=25667, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=65054, ReadCompletions=65053; Writes=27207, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=68253, ReadCompletions=68252; Writes=28564, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=71607, ReadCompletions=71606; Writes=29927, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=76099, ReadCompletions=76098; Writes=31407, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=79723, ReadCompletions=79722; Writes=32855, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=82840, ReadCompletions=82839; Writes=34286, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=86096, ReadCompletions=86096; Writes=35661, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=89160, ReadCompletions=89159; Writes=37019, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=92394, ReadCompletions=92393; Writes=38490, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=95396, ReadCompletions=95395; Writes=39926, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=97939, ReadCompletions=97938; Writes=41192, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=101843, ReadCompletions=101842; Writes=42651, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=104881, ReadCompletions=104880; Writes=44152, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=108378, ReadCompletions=108377; Writes=45580, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=112323, ReadCompletions=112322; Writes=47010, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=116103, ReadCompletions=116102; Writes=48429, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=120245, ReadCompletions=120245; Writes=49930, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=123535, ReadCompletions=123535; Writes=51389, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=127469, ReadCompletions=127468; Writes=52786, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=131591, ReadCompletions=131591; Writes=54296, WriteCompletions=6225
Number of connections: 1
==========================================
ECHOER: Reads=135172, ReadCompletions=135171; Writes=55770, WriteCompletions=6225
Number of connections: 1D:\WenningQiu\IofwTestHarness>
- Edited by Qiu, Wenning Tuesday, May 15, 2012 4:00 PM
- Edited by Qiu, Wenning Tuesday, May 15, 2012 4:05 PM


