Asked by:
Calling a method from the hub to a Group of Clients Stops Working after a while.

Question
-
User1152062383 posted
We developed a signalR Web Application that is composed of 3 pieces.
Browsers – javascript
NLB Web Servers – Hub, with the SQL Backplane
WCF Service – C# Clients on the same server as the web servers.
Everything works fine in pre-prod environment
It even works fine in production for a while.
But for some reason after a while (a few hours) The connection from the Web Server to the WCF service silently stops.
It’s hard to explain, we can see the method that should execute on the WCF client being called from the web server via the logs, but it just never happens.
We have enabled tracing and are logging each event and there are no errors or even state changes, that can help us figure out why this all of a sudden stops.
Here is how we are handling the events on the WCF client
_connection = new HubConnection(Properties.Settings.Default.Url); _connection.Credentials = CredentialCache.DefaultNetworkCredentials; ... _connection.TraceLevel = TraceLevels.All; _connection.TraceWriter = writer; _myHubProxy = _connection.CreateHubProxy("MyHub"); _myHubProxy.On<int, string>("Subscribe", (destinationID, connectionID) => { Subscribe(destinationID, connectionID); } ); _myHubProxy.On<int, string>("UnSubscribe", (destinationID, connectionID) => { UnSubscribe(destinationID, connectionID); } ); _connection.Closed += SingalRClosedEvent; _connection.Reconnected += SignalRReconnectedEvent; _connection.Reconnecting += SignalRReconnectingEvent; _connection.ConnectionSlow += SignalRConnectionSlowEvent; _connection.StateChanged += SignalRStateChangedEvent; _connection.Error += SignalRErrorEvent; _connection.Received += SignalRRecivedEvent; ... private void SingalRClosedEvent() { string METHOD_NAME = MethodBase.GetCurrentMethod().Name; using (var logger = LoggerFactory.GetMethodLogger(_CLASS_ + "." + METHOD_NAME)) { _connection = null; _myHubProxy = null; Thread.Sleep(5000); Task.Run(() => { EstablishConnection(); }).ContinueWith(TaskException, TaskContinuationOptions.OnlyOnFaulted); } }
Calls from the WCF to the Web Server continue to work. The Brower component works perfectly.
We can see the keep alive messages in the trace.
Usually recycling both the website and WCF app pools will fix the issue.
We are handling the Closed Event and reconnecting, although this never gets called.
If we get in this weird state and the WCF Client calls the Web Hub to rejoin the group it is part of, like this
await Groups.Add(Context.ConnectionId, SUBSCRIPTION_LISTENER_GROUP);
Then we get a task canceled exception, it will timeout and it wont add the connection to the group. If its not in this weird state, it works fine.
I have been looking at this problem for a week and cant think of what else to try.
Does anyone have any suggestions?
Monday, July 10, 2017 11:34 PM
All replies
-
User1771544211 posted
Hi jakeb16,
It even works fine in production for a while.Could you please check the exact time of how long it works? If the Idle timeout is configured on the server, the worker process will be shut down after inactive for some time(default is 20min). For more about the idle timeout : https://technet.microsoft.com/en-us/library/cc771956%28v=ws.10%29.aspx?f=255&MSPPError=-2147217396
It’s hard to explain, we can see the method that should execute on the WCF client being called from the web server via the logs, but it just never happens.Please add log for the server method that will call the WCF client to check if the server calls the WCF client. And call the WCF client use other approaches to check if the WCF client is working.
Best Regards,
Jean
Tuesday, July 11, 2017 5:45 AM -
User1152062383 posted
Thanks for the quick response I have been working on gathering the logs to demonstrate the problem
The Idle Timeout isn’t the problem, we have ide lTime-out Minutes set to 0
It seem to be a problem with loosing group membership (if I had to guess)
Lets start with what the logs look like when everything is working and I call SubscribeUser on the clients
Getting the message from the backplane
SignalR.SqlMessageBus Verbose: 0 : Stream 0 : SqlReceiver last payload ID=14877453, new payload ID=14877454 DateTime=2017-07-11T17:27:58.4110548Z SignalR.SqlMessageBus Verbose: 0 : Stream 0 : Updated receive reader initial payload ID parameter=14877454 DateTime=2017-07-11T17:27:58.4110548Z SignalR.SqlMessageBus Verbose: 0 : Stream 0 : Payload 14877454 containing 1 message(s) received DateTime=2017-07-11T17:27:58.4110548Z SignalR.ScaleoutMessageBus Information: 0 : OnReceived(0, 14877454, 1) DateTime=2017-07-11T17:27:58.4110548Z SignalR.ScaleoutMessageBus Verbose: 0 : Received message 14877454: '{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[62252,"8d2c1784-5b54-40ec-9c6a-aae799b6016e"]}' over ScaleoutMessageBus DateTime=2017-07-11T17:27:58.4110548Z SignalR.ScaleoutMessageBus Verbose: 0 : Message id: 14877454, stream : 0, eventKey: 'hg-RealTimeInboxHub.SubscriptionListenerGroup' saved with local id: 3 DateTime=2017-07-11T17:27:58.4110548Z SignalR.ScaleoutMessageBus Verbose: 0 : Scheduling eventkeys: hg-RealTimeInboxHub.SubscriptionListenerGroup DateTime=2017-07-11T17:27:58.4110548Z
Sending to 3 Clients in the SubscriptionListenerGroup group
SignalR.Transports.ServerSentEventsTransport Verbose: 0 : Sending outgoing message. Connection id: c47b7071-854f-4994-bbf9-376017012cff, transport: ServerSentEventsTransport, message: data: {"C":"s-0,E3030E","M":[{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[62252,"8d2c1784-5b54-40ec-9c6a-aae799b6016e"]}]} DateTime=2017-07-11T17:27:58.4110548Z SignalR.Transports.ServerSentEventsTransport Verbose: 0 : Sending outgoing message. Connection id: c80c4624-b762-49cb-bd06-135ebbe4442f, transport: ServerSentEventsTransport, message: data: {"C":"s-0,E3030E","M":[{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[62252,"8d2c1784-5b54-40ec-9c6a-aae799b6016e"]}]} DateTime=2017-07-11T17:27:58.4110548Z SignalR.Transports.ServerSentEventsTransport Verbose: 0 : Sending outgoing message. Connection id: 61a7c1d1-1d42-4111-9242-122879cb2d5b, transport: ServerSentEventsTransport, message: data: {"C":"s-0,E3030E","M":[{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[62252,"8d2c1784-5b54-40ec-9c6a-aae799b6016e"]}]} DateTime=2017-07-11T17:27:58.4110548Z
Tester App - equivalent of WCF, I have a windows form that is doing the same thing as the WCF clients. I just have tracing and logging turned on so I can see what is happening.
17:27:58.4110548 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {"C":"s-0,E3030E","M":[{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[62252,"8d2c1784-5b54-40ec-9c6a-aae799b6016e"]}]})
We can see it join the group?
SignalR.ScaleoutMessageBus Verbose: 0 : Received message 14877451: '{"WaitForAck":true,"Id":"c25ecd3b-7bdb-43d9-8a50-43b2c9e2c3e6","CommandType":1,"Value":"hg-RealTimeInboxHub.SubscriptionListenerGroup"}' over ScaleoutMessageBus DateTime=2017-07-11T17:27:58.2394548Z SignalR.ScaleoutMessageBus Verbose: 0 : Message id: 14877451, stream : 0, eventKey: 'c-c47b7071-854f-4994-bbf9-376017012cff' saved with local id: 1
Now We Wait 4 Hours and I call SubscribeUser on the clients
Getting the message from the backplane
SignalR.SqlMessageBus Verbose: 0 : Stream 0 : SqlReceiver last payload ID=15157854, new payload ID=15157855 DateTime=2017-07-11T21:08:18.8178518Z SignalR.SqlMessageBus Verbose: 0 : Stream 0 : Updated receive reader initial payload ID parameter=15157855 DateTime=2017-07-11T21:08:18.8178518Z SignalR.SqlMessageBus Verbose: 0 : Stream 0 : Payload 15157855 containing 1 message(s) received DateTime=2017-07-11T21:08:18.8178518Z SignalR.ScaleoutMessageBus Information: 0 : OnReceived(0, 15157855, 1) DateTime=2017-07-11T21:08:18.8178518Z SignalR.ScaleoutMessageBus Verbose: 0 : Received message 15157855: '{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[43559,"b1057d8a-a49d-41a2-b6d5-be8e12c3b85e"]}' over ScaleoutMessageBus DateTime=2017-07-11T21:08:18.8178518Z SignalR.ScaleoutMessageBus Verbose: 0 : Message id: 15157855, stream : 0, eventKey: 'hg-RealTimeInboxHub.SubscriptionListenerGroup' saved with local id: 22 DateTime=2017-07-11T21:08:18.8178518Z SignalR.ScaleoutMessageBus Verbose: 0 : Scheduling eventkeys: hg-RealTimeInboxHub.SubscriptionListenerGroup DateTime=2017-07-11T21:08:18.8178518Z SignalR.SqlMessageBus Verbose: 0 : Stream 0 : SqlReceiver last payload ID=15157855, new payload ID=15157856 DateTime=2017-07-11T21:08:18.8178518Z
There is nothing in the transport logs (of interest) except for keep alive messages from the same client that should have been called
SignalR.Transports.TransportHeartBeat Verbose: 0 : KeepAlive(c47b7071-854f-4994-bbf9-376017012cff) DateTime=2017-07-11T21:08:08.9118518Z
Tester App - equivalent of WCF
All we see is keep alive messages in the client trace
21:07:08.8206518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {}) 21:07:18.8358518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {}) 21:07:28.8510518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {}) 21:07:38.8662518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {}) 21:07:48.8814518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {}) 21:07:58.8966518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {}) 21:08:08.9118518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {}) 21:08:18.9270518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {}) 21:08:28.9422518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {}) 21:08:38.9574518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {}) 21:08:48.9726518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {}) 21:08:58.9878518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {}) 21:09:09.0030518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {}) 21:09:19.0182518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
Finally I left everything in a bad state and opened another instance of the tester app. Only the new instance of the tester app got the subscribe event.
This time when I started the connection I got a exception joining the group, even though it seem to work ok as it got the subscribe callawait Groups.Add(Context.ConnectionId, SUBSCRIPTION_LISTENER_GROUP);
Fails with
Exception Joining SubscriptionListenerGroup group. [Exception: System.Threading.Tasks.TaskCanceledException: A task was canceled.
Getting the message from the backplane
SignalR.SqlMessageBus Verbose: 0 : Stream 0 : SqlReceiver last payload ID=15198582, new payload ID=15198583 DateTime=2017-07-11T21:37:44.1742518Z SignalR.SqlMessageBus Verbose: 0 : Stream 0 : Updated receive reader initial payload ID parameter=15198583 DateTime=2017-07-11T21:37:44.1742518Z SignalR.SqlMessageBus Verbose: 0 : Stream 0 : Payload 15198583 containing 1 message(s) received DateTime=2017-07-11T21:37:44.1742518Z SignalR.ScaleoutMessageBus Information: 0 : OnReceived(0, 15198583, 1) DateTime=2017-07-11T21:37:44.1742518Z SignalR.ScaleoutMessageBus Verbose: 0 : Received message 15198583: '{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[43559,"cb15e712-8302-4523-93a4-4eeb27602354"]}' over ScaleoutMessageBus DateTime=2017-07-11T21:37:44.1742518Z SignalR.ScaleoutMessageBus Verbose: 0 : Message id: 15198583, stream : 0, eventKey: 'hg-RealTimeInboxHub.SubscriptionListenerGroup' saved with local id: 25 DateTime=2017-07-11T21:37:44.1742518Z SignalR.ScaleoutMessageBus Verbose: 0 : Scheduling eventkeys: hg-RealTimeInboxHub.SubscriptionListenerGroup DateTime=2017-07-11T21:37:44.1742518Z SignalR.SqlMessageBus Verbose: 0 : Stream 0 : SqlReceiver last payload ID=15198583, new payload ID=15198584 DateTime=2017-07-11T21:37:44.1742518Z
Only Sending to the new instance of the tester app in the SubscriptionListenerGroup group (I don’t know why the other 3 didn’t get this message)
SignalR.Transports.ServerSentEventsTransport Verbose: 0 : Sending outgoing message. Connection id: b1565993-1dba-42d4-87b1-9b95e56a88f8, transport: ServerSentEventsTransport, message: data: {"C":"s-0,E7E977","M":[{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[43559,"cb15e712-8302-4523-93a4-4eeb27602354"]}]} DateTime=2017-07-11T21:37:44.1898518Z
New instance of Tester App - equivalent of WCF Get the message, the other one doesn't.
21:37:44.1898518 - b1565993-1dba-42d4-87b1-9b95e56a88f8 - SSE: OnMessage(Data: {"C":"s-0,E7E977","M":[{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[43559,"cb15e712-8302-4523-93a4-4eeb27602354"]}]})
Also note: If I the original Tester app tries to rejoin the group it also gets the same exception as above, but this doesnt seem to help at all. I dont see the same kind of event that suggest it actually tried to join the group in the logs.Also if I recycle the website, the clients then start receiving messages again.
Tuesday, July 11, 2017 10:20 PM -
User1771544211 posted
Hi jakeb16,
Based on your description, the SignalR can't send outgoing message to the old tester App after a period of time.
I guess there are some conditions can lead to this issue.
1. The old tester Apps are shut down or inactive after a period time.
2. The old tester Apps need to authorize the SignalR request. But after a period of time the authorized state is lost.
And you can try enable the SignalR tracing to view diagnostic information about events in your SignalR application.
Best Regards,
Jean
Thursday, July 13, 2017 7:28 AM -
User1152062383 posted
Hi Jean, just got back from vacation (sorry for the delay).
Thanks for your help.
As you can see above I did turn on tracing and copied the logs above.
What do you mean by inactive after a period of time (will this matter?) I can see the keep alive messages but there may be periods where neither the server or client is invoking methods on each other .
Its definitely not shutdown as its a winform I can see.
As for authorization I would have expected to see errors in the logs on the server side that it cant connect to the client (old tester app)
Additionally if it was a authorization error wouldn't a close type event occur, since it looses communication? (which is not going on)
-Jacob
Monday, July 24, 2017 4:17 PM -
User1771544211 posted
Hi jakeb16,
What do you mean by inactive after a period of time (will this matter?) I can see the keep alive messages but there may be periods where neither the server or client is invoking methods on each other .In your previous post "Only Sending to the new instance of the tester app in the SubscriptionListenerGroup group (I don’t know why the other 3 didn’t get this message)", you mention that only the new instance of tester app works, the old apps not work. I'm not sure what's happening here, but the root cause may occur during this stage.
As for authorization I would have expected to see errors in the logs on the server side that it cant connect to the client (old tester app)
Additionally if it was a authorization error wouldn't a close type event occur, since it looses communication? (which is not going on)
Just some thoughts, I'm not sure about the root cause of your issue.
Your issue is strange and to condition is complex, it's hard to find the reason and solution. So we should image of any possible causes and then verify these possible causes.
Best Regards,
Jean
Tuesday, July 25, 2017 5:16 AM -
User1152062383 posted
Thanks for clarifying and you are trying! However I really am no closer to figuring this out. :( At this point what I did was stopped using signalR for communication between the Web site and WCF services. And just used WCF Calls. The downside is now the website has to keep track of each node in the cluster so it can call a WCF method on each node, instead of relying on the backplane.
If anyone knows of a way that group membership could be lost after a period of time that might help me out.
Tuesday, July 25, 2017 4:51 PM -
User1152062383 posted
So we just inserted this change, and now we are seeing something very similar from the web server to the browsers. The browsers can send messages and they are received by the website, but the web site is not sending any messages back to the browser (after a while, maybe a hour)
Tuesday, July 25, 2017 9:39 PM -
User991499041 posted
Hi jakeb16,
When you add a user to a group using the Groups.Add method, the user receives messages directed to that group for the duration of the current connection, but the user's membership in that group is not persisted beyond the current connection. If you want to permanently retain information about groups and group membership, you must store that data in a repository such as a database or Azure table storage. Then, each time a user connects to your application, you retrieve from the repository which groups the user belongs to, and manually add that user to those groups.When reconnecting after a temporary disruption, the user automatically re-joins the previously-assigned groups. Automatically rejoining a group only applies when reconnecting, not when establishing a new connection.
Regards,
zxj
Thursday, August 10, 2017 7:56 AM