locked
I/O Completion Ports and Socket Send RRS feed

  • Question

  • I am in the process of developing a very high performance socket application.  I have been using the new Socket.ReceiveAsync (along with pooled SocketAsyncEventArgs) API to acheive quite acceptable message receiving performance in the order of ~150k 128 byte messages per second per connected socket (on some run of the mill hardware).  I have been running both client and server on the same machine.

    I had initially developed a native Win32 C++ app to act as a test server (producer) that sent a configured number of messages to the .NET receiving client as fast as it could.  My first cut at this C++ app used a standard blocking call to WSASend (basically the same as Socket.Send), which internally sends the message to the consumer (client) and waits for that consumer to receive the message before returning.  It appeared as if my client-side receive performance was being mostly bounded by the rate at which the server was able to send it messages.

    I then moved my server C++ app over to .NET and began using the Socket.Send API and found it to have similar performance to the C++ WinSock version.  In the hopes to improve server send performance, I moved to using the Socket.SendAsync API and pooled SocketAsyncEventArgs.  Internally, the XXXAsync (and Begin/EndXXX for that matter) use the CLR I/O completion port, which the CLR I/O Thread Pool monitors and calls your provided completion callbacks/events from.  My hope was to improve send performance by calling SendAsync quickly enough so that multiple send operations would be oustanding on the socket so that it could send a batch of them at a time without transitioning out of kernel mode.  However, I saw my performance take a huge hit where I was only sending about 1/3 the number of messages per second as I was when using standard synchronous sends.  I saw even worse performance when using Begin/EndSend.

    These findings prompted me to investigate further into what was going on with this poor IOCP performance for send operations and I went back to developing my server using C++ and straight WinSock.  This time I used overlapped sockets, which I bound to the Win32 Thread Pool's IOCP that would call my provided callback when each send operation was received by the client.  I also saw the same similar poor performance that I saw when using the .NET Socket.SendAsync API.  Oddly, when running the client receiver side-by-side with the server/sender, the client would have received all the messages at the exact time the server issued its last overlapped/async WSASend call.  I would've expected the server to make its last async WSASend call (not necessarily completing it on the IOCP) before the client had received all of the messages.  The observed behavior was what would be expected of synchronous WSASend calls, not async.

    Next I blocked my receiving client long enough so that the server would have time to send all of its messages to it before the client received the first one.  In this case, the server would issue all of its asncy WSASend calls extermely quickly.  When the client "woke up" it would process all the messages in record time - around 225k messages per second.  This is what I would've expected for asynch WSASends and IOCPs.

    Having a further look, I commented out by code in the server that bound my socket (and callback) to the Win32 Thread Pool's IOCP and also removed my sleep/blocking in the client receiver and ran my test again.  I saw similar performance using async WSASend calls as I did with using sync WSASend calls.

    So it would appear as if there is some "chatter" that occurs when using IOCP on Win32 (and in .NET).  Is this due to some kernel/user mode transitions to call the callback from an IOCP thread?  Is there anyway around this?  I'm really looking for an answer that includes what's going on internally within Windows/WinSock.  I appreciate the help on this one.

     

    Below is the C++ WinSock code for the server:

     

    // SocketTestServer.cpp : Defines the entry point for the console application.
    //
    
    #include "stdafx.h"
    
    using namespace std;
    
    int numOfMessagesToSend = 0;
    volatile LONG numOfMessagesSent = 0;
    
    VOID CALLBACK IocpCompletionCallback (DWORD dwErrorCode, DWORD dwNumberOfBytesTransferred, LPOVERLAPPED lpOverlapped)
    {
    	if (dwErrorCode != 0)
    	{
    		cout << "Error occured during IOCP completion: " << dwErrorCode << endl;
    		return;
    	}
    	if (dwNumberOfBytesTransferred <= 0)
    	{
    		cout << "Server initiated disconnect.  bytes transferred <=0." << endl;
    		return;
    	}
    
    	//delete lpOverlapped;
    
    	//InterlockedIncrement (&numOfMessagesSent);
    }
    
    
    int main(int argc, char* argv[])
    {
    	WSADATA wsaData;
    	WSAStartup (MAKEWORD(2,2), &wsaData);
    
    	numOfMessagesToSend = atoi (argv[2]);
    
    	addrinfo *pResultAddrInfo = NULL, hintAddrInfo;
    
    	ZeroMemory (&hintAddrInfo, sizeof(hintAddrInfo));
    	hintAddrInfo.ai_family = AF_INET;
    	hintAddrInfo.ai_socktype = SOCK_STREAM;
    	hintAddrInfo.ai_protocol = IPPROTO_TCP;
    	hintAddrInfo.ai_flags = AI_PASSIVE;
    
    	int iResult = getaddrinfo ( NULL, argv[1], &hintAddrInfo, &pResultAddrInfo);
    	if (iResult != 0)
    	{
    		cout << "Server getaddrinfo failed: " << iResult << endl;
    		return 1;
    	}
    
    	SOCKET listenSocket = INVALID_SOCKET;
    	listenSocket = WSASocket (
    		pResultAddrInfo->ai_family,
    		pResultAddrInfo->ai_socktype,
    		pResultAddrInfo->ai_protocol,
    		NULL,
    		0,
    		WSA_FLAG_OVERLAPPED);
    
    	if (listenSocket == INVALID_SOCKET)
    	{
    		cout << "Server WSASocket failed: " << WSAGetLastError () << endl;
    		freeaddrinfo (pResultAddrInfo);
    		return 1;
    	}
    
    	iResult = bind (listenSocket, pResultAddrInfo->ai_addr, (int)pResultAddrInfo->ai_addrlen);
    	if (iResult == SOCKET_ERROR)
    	{
    		cout << "Server bind failed: " << WSAGetLastError () << endl;
    		closesocket (listenSocket);
    		listenSocket = INVALID_SOCKET;
    		freeaddrinfo(pResultAddrInfo);
    		return 1;
    	}
    
    	if (listen (listenSocket, SOMAXCONN) == SOCKET_ERROR)
    	{
    		cout << "Server listen failed: " << WSAGetLastError () << endl;
    		closesocket (listenSocket);
    		listenSocket = INVALID_SOCKET;
    		freeaddrinfo(pResultAddrInfo);
    		return 1;
    	}
    
    	freeaddrinfo(pResultAddrInfo);
    
    	cout << "Server listening for connections on port: " << argv[1] << endl;
    
    	SOCKET clientSocket = INVALID_SOCKET;
    	clientSocket = WSAAccept (
    		listenSocket,
    		NULL,
    		NULL,
    		NULL,
    		0);
    	
    	if (clientSocket == INVALID_SOCKET)
    	{
    		cout << "Server WSAAccept failed: " << WSAGetLastError () << endl;
    		freeaddrinfo (pResultAddrInfo);
    		closesocket (listenSocket);
    		return 1;
    	}
    
    	WSABUF sendBuffer;
    	int iDataSize = atoi(argv[3]);
    	char* pData = new char[iDataSize];
    	memset(pData, 'D', iDataSize);
    	memset(pData, 0, 4);
    	_itoa_s(iDataSize, pData, 4, 10);
    	sendBuffer.buf = pData;
    	sendBuffer.len = iDataSize;
    	DWORD dwNumOfBytesReceived = 0;
    	DWORD dwNumOfBytesSent = 0;
    	DWORD dwFlags = 0;
    	int messagesSent = 0;
    
    	cout << "Client connection accepted.  Enter 's' to send " << argv[2] << " messages: ";
    	char cInput;
    	cin >> cInput;
    
    	BOOL bSuccess = BindIoCompletionCallback (
    	(HANDLE)clientSocket,
    	IocpCompletionCallback,
    	0);
    
    	if (bSuccess == FALSE)
    	{
    		cout << "BindIoCompletionCallback (associate socket) failed: " << GetLastError () << endl;
    		closesocket (listenSocket);
    		freeaddrinfo (pResultAddrInfo);
    		WSACleanup ();
    		return 1;
    	}
    
    	LPWSAOVERLAPPED pOverlappedArray = new WSAOVERLAPPED[numOfMessagesToSend];
    	ZeroMemory (pOverlappedArray, sizeof(WSAOVERLAPPED) * numOfMessagesToSend);
    
    	if (cInput == 's')
    	{
    		cout << "Sending messages..." << endl;
    
    		do
    		{
    			messagesSent = 0;
    			while (messagesSent < numOfMessagesToSend)
    			{
    				LPWSAOVERLAPPED pOverlapped = &pOverlappedArray[messagesSent];
    
    				iResult = WSASend (
    					clientSocket, 
    					&sendBuffer, 
    					1,
    					&dwNumOfBytesSent,
    					0,
    					pOverlapped,
    					NULL);
    				if (iResult == SOCKET_ERROR)
    				{
    					int err = WSAGetLastError ();
    					if (err != WSA_IO_PENDING)
    					{
    						cout << "Server WSASend failed: " << err << endl;
    						break;
    					}
    				}
    				messagesSent++;
    			}
    			cout << "Messages posted to IOCP" << endl;
    			cout << "Enter 'q' to disconnect and shutdown or 's' to send messages again: ";
    			cin >> cInput;
    		} while (cInput == 's');
    	}
    
    	cout << "Shutting down server..." << endl;
    
    	delete pData;
    
    	iResult = shutdown (clientSocket, SD_BOTH);
    	if (iResult == SOCKET_ERROR)
    	{
    		cout << "Server shutdown failed: " << WSAGetLastError () << endl;
    	}
    
    	closesocket(clientSocket);
    
    	WSACleanup ();
    
    	return 0;
    }
    
    =


    FYI - I've also cross-posted this to the .NET System.Net and Parrallel forums.

    Thanks,
    Brandon

    Monday, June 15, 2009 10:26 PM