A NetTcpBinding WCF service in Windows Azure work role refuses client connections at some stage
-
15 สิงหาคม 2555 0:27
We have been stuck with this problem for a couple of weeks, hopefully someone here could give us a clue:
We have a NetTcpBinding (with DuplexChannel callback, InstanceContextMode = InstanceContextMode.Single, ConcurrencyMode = ConcurrencyMode.Multiple, ReliableSesstion enabled) WCF service deployed as a Windows Azure worker role. It works fine when after it starts – the clients connect to it without any issue. But after running for a couple of days, new clients fail to connect to it with the following exception while the existing established connection still work (the established connection can still invoke service methods and receive events with no problem).
System.ServiceModel.EndpointNotFoundException: Could not connect to net.tcp://myserive.mycompany.com:443/MyServer. The connection attempt lasted for a time span of 00:00:02.6718750. TCP error code 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 157.55.143.80:443. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 157.55.143.80:443
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
at System.ServiceModel.Channels.SocketConnectionInitiator.Connect(Uri uri, TimeSpan timeout)
--- End of inner exception stack trace ---
…
…
We got the WCF service memory dump file and analysed it with Windebug. It seems that there are no issue with Memory/CPU/Thread/DeadLock etc. And we also confirm that the current number of connections has not reached the serviceThrottling “maxConcurrentSessions” value yet (there are only tens of active connections and we set the maxConcurrentSessions="1000").
We have enabled the WCF trace on test client. The error message indicates that it fails to open socket connection to the server:
<Exception><ExceptionType>System.ServiceModel.EndpointNotFoundException, System.ServiceModel, Version=3.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</ExceptionType><Message>Could not connect to net.tcp://myserive.mycompany.com:443/MyServer. The connection attempt lasted for a time span of 00:00:20.9990592. TCP error code 10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 157.55.143.80:443. </Message><StackTrace> at System.ServiceModel.Channels.SocketConnectionInitiator.TraceConnectFailure(Socket socket, SocketException socketException, Uri remoteUri, TimeSpan timeSpentInConnect)
at System.ServiceModel.Channels.SocketConnectionInitiator.Connect(Uri uri, TimeSpan timeout)
at System.ServiceModel.Channels.BufferedConnectionInitiator.Connect(Uri uri, TimeSpan timeout)
at System.ServiceModel.Channels.TracingConnectionInitiator.Connect(Uri uri, TimeSpan timeout)
at System.ServiceModel.Channels.ConnectionPoolHelper.EstablishConnection(TimeSpan timeout)
at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.OnOpen(TimeSpan timeout)
at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
…
…
We also ran “netstat ” on the server. It shows that the server is listening the NetTcpBinding port (443 in our scenario) and some established tcp connections between the client and server.
We could not figure out why the WCF service refuses socket connection after it running for some time.
Any help appreciated.
ตอบทั้งหมด
-
15 สิงหาคม 2555 5:31ผู้ดูแล
Hi,
>there are only tens of active connections
Could you post the exact number of the active connections? This is a very important clue.
Please also enable server side WCF tracing to see whether you can get some useful information from the trace log when the problem occurs.
Please also post your serviceThrottling and binding configurations.
Allen Chen [MSFT]
MSDN Community Support | Feedback to us
Get or Request Code Sample from Microsoft
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
- แก้ไขโดย Allen Chen - MSFTModerator 15 สิงหาคม 2555 5:31
- ทำเครื่องหมายเป็นคำตอบโดย Allen Chen - MSFTModerator 5 กันยายน 2555 1:59
- ยกเลิกการทำเครื่องหมายเป็นคำตอบโดย Allen Chen - MSFTModerator 6 กันยายน 2555 1:32
-
16 สิงหาคม 2555 2:26
>Could you post the exact number of the active connections? This is a very important clue.
Following is the memory dump on the FlowThrottle/Sessions. It indicates the maxConcurrentSessions is 1000 and the current active sessions is 12.
0:000> !do 0000000002275b08
Name: System.ServiceModel.Dispatcher.FlowThrottle
MethodTable: 000007ff0107d298
EEClass: 000007ff01092728
Size: 88(0x58) bytes
File: D:\windows\Microsoft.Net\assembly\GAC_MSIL\System.ServiceModel\v4.0_4.0.0.0__b77a5c561934e089\System.ServiceModel.dll
Fields:
MT Field Offset Type VT Attr Value Name
000007feeebdc758 4002ec7 40 System.Int32 1 instance 1000 capacity
000007feeebdc758 4002ec8 44 System.Int32 1 instance 12 count
000007feeebdd588 4002ec9 4c System.Boolean 1 instance 0 warningIssued
000007feeebdc758 4002eca 48 System.Int32 1 instance 70 warningRestoreLimit
000007feeebd59c8 4002ecb 8 System.Object 0 instance 0000000002275b60 mutex
000007feeebecbb0 4002ecc 10 ...ding.WaitCallback 0 instance 0000000002275ac8 release
000007ff010e0b90 4002ecd 18 ...bject, mscorlib]] 0 instance 0000000002275b78 waiters
000007feeebd6870 4002ece 20 System.String 0 instance 0000000002275a38 propertyName
000007feeebd6870 4002ecf 28 System.String 0 instance 0000000002275a80 configName
000007feeebe81c8 4002ed0 30 System.Action 0 instance 00000000022e02d8 acquired
000007feeebe81c8 4002ed1 38 System.Action 0 instance 00000000022e0318 releasedAnd further analyse on NetTcpChannel socket connection pool ConnectionAcceptor shows that the problem is because the Tcp socket connection pool is full. The counter of the connections has reached the maximum value of the maxPendingConnections and the ConnectionAcceptor does not accept new socket connections under this situation.
But we do not know the reason why the number of connections in socket connection pool reached maxPendingConnections (10). The tempoerary solution is to increase the value of maxPendingConnections to 50 (maxConnections="50" in the binding configuration tag).
0:000> !do 00000000023d5d00
Name: System.ServiceModel.Channels.ConnectionAcceptor
MethodTable: 000007ff013c1be0
EEClass: 000007ff013b11a8
Size: 88(0x58) bytes
File: D:\windows\Microsoft.Net\assembly\GAC_MSIL\System.ServiceModel\v4.0_4.0.0.0__b77a5c561934e089\System.ServiceModel.dll
Fields:
MT Field Offset Type VT Attr Value Name
000007feeebdc758 400035e 38 System.Int32 1 instance 1 maxAccepts
000007feeebdc758 400035f 3c System.Int32 1 instance 10 maxPendingConnections
000007feeebdc758 4000360 40 System.Int32 1 instance 10 connections
000007feeebdc758 4000361 44 System.Int32 1 instance 0 pendingAccepts
000007ff013c03e0 4000362 8 ...onnectionListener 0 instance 00000000023d5ab8 listener
000007feeebf8b50 4000363 10 System.AsyncCallback 0 instance 00000000023d5df0 acceptCompletedCallback
000007feeec042f0 4000364 18 ...bject, mscorlib]] 0 instance 00000000023d5e30 scheduleAcceptCallback
000007feeebe81c8 4000365 20 System.Action 0 instance 00000000023d5d58 onConnectionDequeued
000007feeebdd588 4000366 48 System.Boolean 1 instance 0 isDisposed
000007ff013c21a8 4000367 28 ...AvailableCallback 0 instance 00000000023d5cc0 callback
000007ff013c0e98 4000368 30 ...els.ErrorCallback 0 instance 00000000023d5ba0 errorCallback- เสนอเป็นคำตอบโดย Veerendra Kumar 16 สิงหาคม 2555 7:25
-
16 สิงหาคม 2555 7:20ผู้ดูแล
Hi,
Thanks for the update. Do you mean you've resolved this issue after setting maxConnections to a larger value?
Allen Chen [MSFT]
MSDN Community Support | Feedback to us
Get or Request Code Sample from Microsoft
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
-
16 สิงหาคม 2555 9:43
Hi Allen,
We would need a couple of days to check the result of setting maxConnections to a larger value.
To be honest, we are still looking at a more elegant solution. We have noticed that if the WCF trace is enabled on the server, a WCF warning message "Maximum number of pending connections has been reached. " will create when the connections reaches the value of maxConnections. Maybe we could restart the ServiceHost based on the appearance of this warning.
- ทำเครื่องหมายเป็นคำตอบโดย Allen Chen - MSFTModerator 5 กันยายน 2555 1:59
- ยกเลิกการทำเครื่องหมายเป็นคำตอบโดย Allen Chen - MSFTModerator 6 กันยายน 2555 1:33
-
22 สิงหาคม 2555 1:23ผู้ดูแล
Hello,
Is there any update of this issue? Does the change make it work?
Allen Chen [MSFT]
MSDN Community Support | Feedback to us
Get or Request Code Sample from Microsoft
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
-
5 กันยายน 2555 22:38
Sorry for my late response. We have spent more than two weeks to observer it.
It seems that changing the settings of maxConnections to 100 does not resolve the problem completely. The connections still reaches the 100 after one week.
And as expected, we got the "Maximum number of pending connections has been reached." warning in the WCF trace files.
I have decompiled the System.ServiceModel.dll to try to understand the logic behind the connections counter of the System.ServiceModel.Channels.ConnectionAcceptor class. But it is more complicate than I expected. I am aware of when the connections counter increases (on a new socket connection accepted) but I could not figure out when (or say to satisfy what sort of conditions) its value will decrease.
Just wondering anyone who has a deep understanding of it could explain what sort of potential situations could increase the connections counter value without decreasing it.
Another thing we have noticed, form the result of a simple "Netstat" command, that the server only has about 25 tcp connections from clients - so why the connections counter can reach 100 in this situation?
- แก้ไขโดย Roger NZ 6 กันยายน 2555 1:11
-
6 กันยายน 2555 3:31ผู้ดูแล
Hi,
>Just wondering anyone who has a deep understanding of it could explain what sort of potential situations could increase theconnections counter value without decreasing it.
If you use Reflector to read code you can see the counter is decreased by the bold code. From your description the code this.connections++ is executed but this.connections-- is not. Considering this, my thought:
Assume there is no exception in other code after this.connections++ then flag is true and connection is not null when it enters:
if (connection != null)
{
this.callback(connection, this.onConnectionDequeued); //triggerOnConnectionDequeued
}so this.callback(connection, this.onConnectionDequeued); will be executed.
Assume this.callback(connection, this.onConnectionDequeued) triggers OnConnectionDequeued properly, the counter must be decreased unless it's waiting for ThisLock release.
Suggestions:
- Maybe caused by unhandled exceptions in this method (that is catched by external code). Please check out whether there're exceptions in WCF trace/Event log. Focus on the exceptions that may be thrown in this method.
- Maybe caused by ThisLock. Please analyze dump to check callstack of all managed threads and see whether some threads are waiting for ThisLock release.
- If none of above is the root cause, the only possibility seems is this.callback(connection, this.onConnectionDequeued) somehow does not trigger OnConnectionDequeued. In this case I think it requires further investigation on the dump file. I suggest you contact our support http://www.windowsazure.com/en-us/support/contact/.
>Another thing we have noticed, form the result of a simple "Netstat" command, that the server only has about 25 tcp connections from clients - so why the connections counter can reach 100 in this situation?
The connections counter is an application level counter. It is possible that this counter is not updated correctly due to behaviors that the developers of WCF happen do not consider.
private void HandleCompletedAccept(IAsyncResult result)
{
IConnection connection = null;
lock (this.ThisLock)
{
bool flag = false;
Exception exception = null;
try
{
if (!this.isDisposed)
{
connection = this.listener.EndAccept(result);
if (connection != null)
{
if (DiagnosticUtility.ShouldTraceWarning && ((this.connections + 1) >= this.maxPendingConnections))
{
TraceUtility.TraceEvent(TraceEventType.Warning, 0x40024, SR.GetString("TraceCodeMaxPendingConnectionsReached"), new StringTraceRecord("MaxPendingConnections", this.maxPendingConnections.ToString(CultureInfo.InvariantCulture)), this, null);
}
this.connections++;
}
}
flag = true;
}
catch (CommunicationException exception2)
{
if (DiagnosticUtility.ShouldTraceInformation)
{
DiagnosticUtility.ExceptionUtility.TraceHandledException(exception2, TraceEventType.Information);
}
}
catch (Exception exception3)
{
if (Fx.IsFatal(exception3))
{
throw;
}
if ((this.errorCallback == null) && !ExceptionHandler.HandleTransportExceptionHelper(exception3))
{
throw;
}
exception = exception3;
}
finally
{
if (!flag)
{
connection = null;
}
this.pendingAccepts--;
}
if ((exception != null) && (this.errorCallback != null))
{
this.errorCallback(exception);
}
}
this.AcceptIfNecessary(false);
if (connection != null)
{
this.callback(connection, this.onConnectionDequeued); //trigger OnConnectionDequeued
}
}private void OnConnectionDequeued()
{
lock (this.ThisLock)
{
this.connections--;
}
this.AcceptIfNecessary(false);
}
Allen Chen [MSFT]
MSDN Community Support | Feedback to us
Get or Request Code Sample from Microsoft
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
- แก้ไขโดย Allen Chen - MSFTModerator 6 กันยายน 2555 6:10
-
7 กันยายน 2555 2:03
Thanks Allen for you quick responce.
>> Maybe caused by unhandled exceptions in this method (that is catched by external code). Please check out whether there're exceptions in WCF trace/Event log. Focus on the exceptions that may be thrown in this method.
In the WCF trace there is no exception thrown in the HandleCompletedAccept method.
>> Maybe caused by ThisLock. Please analyze dump to check callstack of all managed threads and see whether some threads are waiting for ThisLock release.
No waiting on the ThisLock.
>>If none of above is the root cause, the only possibility seems is this.callback(connection, this.onConnectionDequeued) somehow does not trigger OnConnectionDequeued.
Took a closer look at the "this.callback(connection, this.onConnectionDequeued)". It does not trigger OnConnectionDequeued, instead, it passes the OnConnectionDequeue as an Action to the ConnectionDequeuedCallback property of the InitialServerConnectionReader class. And at some stage, I guess when the Connection is dequeued from the underlining InputQueue<TChannel>, the OnConnectionDequeued is triggerred. What I could not figure out is what sort conditions to satisfy to make that dequeue happen.
- แก้ไขโดย Roger NZ 7 กันยายน 2555 3:41
-
7 กันยายน 2555 6:49ผู้ดูแล
Hi,
Yes it's dequeued from InputQueue<TChannel>.Dequeue(TimeSpan). The condition you're looking for may vary.
You may first look at all InputQueue<TChannel> objects in managed heap, dump its fields and read Reflector to simulate what the method does. You'll probably be able to figure out what causes InputQueue<T>.InvokeDequeuedCallback not called. If InputQueue<TChannel>.Dequeue(TimeSpan) seems fine, you may look at objects that uses InputQueue<TChannel>.Dequeue(TimeSpan). For example, objects of type (find via Reflector's analyze function):
System.ServiceModel.Dispatcher.DuplexChannelBinder+AutoCloseDuplexSessionChannel
Keep doing the same thing and you'll probably be able to find out the root cause.
Allen Chen [MSFT]
MSDN Community Support | Feedback to us
Get or Request Code Sample from Microsoft
Please remember to mark the replies as answers if they help and unmark them if they provide no help.
-
9 กันยายน 2555 22:54
Hi - I am not sure if this issue has been resolved yet. I had a similar experience - in my case the WCF service was a not a duplex channel but we were still facing the exact same issue as you are. And it turned out that database connections were the issue. Some of the calls were long running and held the TCP connection for longer time and thus not leaving any TCP connection for incoming calls. We increased the TCP connection numbers to the database as well.
Important to note here is - the number of active connections number you get from WCF trace is only related to WCF active connections, it does not tell you other active TCP connectiosn on the server.
You might want to check this. Also (sorry if this offends you) - if you are not having any WCF proxy pool on the client, make sure you close the proxies after you are done using them.
Let us know how this goes.
MkMahesh
-
19 ธันวาคม 2555 8:34No idea why I was reading this (just browsing)... but you can also see this issue when you are not closing your connections on the client when finished calling. Particularly if you are making frequent calls. *Don't forget to close your connections after use ;)