locked
Calling a method from the hub to a Group of Clients Stops Working after a while. RRS feed

  • Question

  • User1152062383 posted

    We developed a signalR Web Application that is composed of 3 pieces.

    Browsers – javascript

    NLB Web Servers – Hub, with the SQL Backplane

    WCF Service – C# Clients on the same server as the web servers.

    Everything works fine in pre-prod environment

    It even works fine in production for a while.

    But for some reason after a while (a few hours) The connection from the Web Server to the WCF service silently stops.

    It’s hard to explain, we can see the method that should execute on the WCF client being called from the web server via the logs, but it just never happens.

    We have enabled tracing and are logging each event and there are no errors or even state changes, that can help us figure out why this all of a sudden stops.

    Here is how we are handling the events on the WCF client

    _connection = new HubConnection(Properties.Settings.Default.Url);
    
    _connection.Credentials = CredentialCache.DefaultNetworkCredentials;
    
    ...
    
    _connection.TraceLevel = TraceLevels.All;
    
    _connection.TraceWriter = writer;
    
    _myHubProxy = _connection.CreateHubProxy("MyHub");
    
    _myHubProxy.On<int, string>("Subscribe",
    
      (destinationID, connectionID) => { Subscribe(destinationID, connectionID); }
    
      );
    
     
    
    _myHubProxy.On<int, string>("UnSubscribe",
    
      (destinationID, connectionID) => { UnSubscribe(destinationID, connectionID); }
    
      );
    
     
    
    _connection.Closed += SingalRClosedEvent;
    
    _connection.Reconnected += SignalRReconnectedEvent;
    
    _connection.Reconnecting += SignalRReconnectingEvent;
    
    _connection.ConnectionSlow += SignalRConnectionSlowEvent;
    
    _connection.StateChanged += SignalRStateChangedEvent;
    
    _connection.Error += SignalRErrorEvent;
    
    _connection.Received += SignalRRecivedEvent;
    
    ...
    
    private void SingalRClosedEvent() {
    
      string METHOD_NAME = MethodBase.GetCurrentMethod().Name;
    
      using (var logger = LoggerFactory.GetMethodLogger(_CLASS_ + "." + METHOD_NAME)) {
    
        _connection = null;
    
        _myHubProxy = null;
    
        Thread.Sleep(5000);
    
        Task.Run(() => {
    
          EstablishConnection();
    
        }).ContinueWith(TaskException, TaskContinuationOptions.OnlyOnFaulted);
    
     
    
      }
    
    }
    
     
    

    Calls from the WCF to the Web Server continue to work. The Brower component works perfectly.

    We can see the keep alive messages in the trace.

     Usually recycling both the website and WCF app pools will fix the issue.

    We are handling the Closed Event and reconnecting, although this never gets called.

    If we get in this weird state and the WCF Client calls the Web Hub to rejoin the group it is part of, like this

    await Groups.Add(Context.ConnectionId, SUBSCRIPTION_LISTENER_GROUP);

    Then we get a task canceled exception, it will timeout and it wont add the connection to the group. If its not in this weird state, it works fine.

    I have been looking at this problem for a week and cant think of what else to try.

    Does anyone have any suggestions?

    Monday, July 10, 2017 11:34 PM

All replies

  • User1771544211 posted

    Hi jakeb16,

    It even works fine in production for a while.

    Could you please check the exact time of how long it works? If the Idle timeout is configured on the server, the worker process will be shut down after  inactive for some time(default is 20min). For more about the idle timeout : https://technet.microsoft.com/en-us/library/cc771956%28v=ws.10%29.aspx?f=255&MSPPError=-2147217396

    It’s hard to explain, we can see the method that should execute on the WCF client being called from the web server via the logs, but it just never happens.

    Please add log for the server method that will call the WCF client to check if the server calls the WCF client. And call the WCF client use other approaches to check if the WCF client is working.

    Best Regards,

    Jean

    Tuesday, July 11, 2017 5:45 AM
  • User1152062383 posted

    Thanks for the quick response I have been working on gathering the logs to demonstrate the problem 

    The Idle Timeout isn’t the problem, we have ide lTime-out Minutes set to 0

    It seem to be a problem with loosing group membership (if I had to guess)

    Lets start with what the logs look like when everything is working and I call SubscribeUser on the clients

    Getting the message from the backplane

    SignalR.SqlMessageBus Verbose: 0 : Stream 0 : SqlReceiver last payload ID=14877453, new payload ID=14877454
        DateTime=2017-07-11T17:27:58.4110548Z
    SignalR.SqlMessageBus Verbose: 0 : Stream 0 : Updated receive reader initial payload ID parameter=14877454
        DateTime=2017-07-11T17:27:58.4110548Z
    SignalR.SqlMessageBus Verbose: 0 : Stream 0 : Payload 14877454 containing 1 message(s) received
        DateTime=2017-07-11T17:27:58.4110548Z
    SignalR.ScaleoutMessageBus Information: 0 : OnReceived(0, 14877454, 1)
        DateTime=2017-07-11T17:27:58.4110548Z
    SignalR.ScaleoutMessageBus Verbose: 0 : Received message 14877454: '{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[62252,"8d2c1784-5b54-40ec-9c6a-aae799b6016e"]}' over ScaleoutMessageBus
        DateTime=2017-07-11T17:27:58.4110548Z
    SignalR.ScaleoutMessageBus Verbose: 0 : Message id: 14877454, stream : 0, eventKey: 'hg-RealTimeInboxHub.SubscriptionListenerGroup' saved with local id: 3
        DateTime=2017-07-11T17:27:58.4110548Z
    SignalR.ScaleoutMessageBus Verbose: 0 : Scheduling eventkeys: hg-RealTimeInboxHub.SubscriptionListenerGroup
        DateTime=2017-07-11T17:27:58.4110548Z
    

    Sending to 3 Clients in the SubscriptionListenerGroup group

    SignalR.Transports.ServerSentEventsTransport Verbose: 0 : Sending outgoing message. Connection id: c47b7071-854f-4994-bbf9-376017012cff, transport: ServerSentEventsTransport, message: data: {"C":"s-0,E3030E","M":[{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[62252,"8d2c1784-5b54-40ec-9c6a-aae799b6016e"]}]}
        DateTime=2017-07-11T17:27:58.4110548Z
    SignalR.Transports.ServerSentEventsTransport Verbose: 0 : Sending outgoing message. Connection id: c80c4624-b762-49cb-bd06-135ebbe4442f, transport: ServerSentEventsTransport, message: data: {"C":"s-0,E3030E","M":[{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[62252,"8d2c1784-5b54-40ec-9c6a-aae799b6016e"]}]}
        DateTime=2017-07-11T17:27:58.4110548Z
    SignalR.Transports.ServerSentEventsTransport Verbose: 0 : Sending outgoing message. Connection id: 61a7c1d1-1d42-4111-9242-122879cb2d5b, transport: ServerSentEventsTransport, message: data: {"C":"s-0,E3030E","M":[{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[62252,"8d2c1784-5b54-40ec-9c6a-aae799b6016e"]}]}
        DateTime=2017-07-11T17:27:58.4110548Z

    Tester App - equivalent of WCF, I have a windows form that is doing the same thing as the WCF clients. I just have tracing and logging turned on so I can see what is happening.

    17:27:58.4110548 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {"C":"s-0,E3030E","M":[{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[62252,"8d2c1784-5b54-40ec-9c6a-aae799b6016e"]}]})
    

    We can see it join the group?

    SignalR.ScaleoutMessageBus Verbose: 0 : Received message 14877451: '{"WaitForAck":true,"Id":"c25ecd3b-7bdb-43d9-8a50-43b2c9e2c3e6","CommandType":1,"Value":"hg-RealTimeInboxHub.SubscriptionListenerGroup"}' over ScaleoutMessageBus
        DateTime=2017-07-11T17:27:58.2394548Z
    
    SignalR.ScaleoutMessageBus Verbose: 0 : Message id: 14877451, stream : 0, eventKey: 'c-c47b7071-854f-4994-bbf9-376017012cff' saved with local id: 1

    Now We Wait 4 Hours and I call SubscribeUser on the clients

    Getting the message from the backplane

    SignalR.SqlMessageBus Verbose: 0 : Stream 0 : SqlReceiver last payload ID=15157854, new payload ID=15157855
        DateTime=2017-07-11T21:08:18.8178518Z
    SignalR.SqlMessageBus Verbose: 0 : Stream 0 : Updated receive reader initial payload ID parameter=15157855
        DateTime=2017-07-11T21:08:18.8178518Z
    SignalR.SqlMessageBus Verbose: 0 : Stream 0 : Payload 15157855 containing 1 message(s) received
        DateTime=2017-07-11T21:08:18.8178518Z
    SignalR.ScaleoutMessageBus Information: 0 : OnReceived(0, 15157855, 1)
        DateTime=2017-07-11T21:08:18.8178518Z
    SignalR.ScaleoutMessageBus Verbose: 0 : Received message 15157855: '{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[43559,"b1057d8a-a49d-41a2-b6d5-be8e12c3b85e"]}' over ScaleoutMessageBus
        DateTime=2017-07-11T21:08:18.8178518Z
    SignalR.ScaleoutMessageBus Verbose: 0 : Message id: 15157855, stream : 0, eventKey: 'hg-RealTimeInboxHub.SubscriptionListenerGroup' saved with local id: 22
        DateTime=2017-07-11T21:08:18.8178518Z
    SignalR.ScaleoutMessageBus Verbose: 0 : Scheduling eventkeys: hg-RealTimeInboxHub.SubscriptionListenerGroup
        DateTime=2017-07-11T21:08:18.8178518Z
    SignalR.SqlMessageBus Verbose: 0 : Stream 0 : SqlReceiver last payload ID=15157855, new payload ID=15157856
    
        DateTime=2017-07-11T21:08:18.8178518Z

    There is nothing in the transport logs (of interest) except for keep alive messages from the same client that should have been called

    SignalR.Transports.TransportHeartBeat Verbose: 0 : KeepAlive(c47b7071-854f-4994-bbf9-376017012cff)
    
        DateTime=2017-07-11T21:08:08.9118518Z

    Tester App - equivalent of WCF

    All we see is keep alive messages in the client trace

    21:07:08.8206518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
    
    21:07:18.8358518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
    
    21:07:28.8510518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
    
    21:07:38.8662518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
    
    21:07:48.8814518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
    
    21:07:58.8966518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
    
    21:08:08.9118518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
    
    21:08:18.9270518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
    
    21:08:28.9422518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
    
    21:08:38.9574518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
    
    21:08:48.9726518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
    
    21:08:58.9878518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
    
    21:09:09.0030518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
    
    21:09:19.0182518 - c47b7071-854f-4994-bbf9-376017012cff - SSE: OnMessage(Data: {})
     

    Finally I left everything in a bad state and opened another instance of the tester app. Only the new instance of the tester app got the subscribe event. 
    This time when I started the connection I got a exception joining the group, even though it seem to work ok as it got the subscribe call

    await Groups.Add(Context.ConnectionId, SUBSCRIPTION_LISTENER_GROUP);

    Fails with 

    Exception Joining SubscriptionListenerGroup group. [Exception: System.Threading.Tasks.TaskCanceledException: A task was canceled.

    Getting the message from the backplane

    SignalR.SqlMessageBus Verbose: 0 : Stream 0 : SqlReceiver last payload ID=15198582, new payload ID=15198583
        DateTime=2017-07-11T21:37:44.1742518Z
    SignalR.SqlMessageBus Verbose: 0 : Stream 0 : Updated receive reader initial payload ID parameter=15198583
        DateTime=2017-07-11T21:37:44.1742518Z
    SignalR.SqlMessageBus Verbose: 0 : Stream 0 : Payload 15198583 containing 1 message(s) received
        DateTime=2017-07-11T21:37:44.1742518Z
    SignalR.ScaleoutMessageBus Information: 0 : OnReceived(0, 15198583, 1)
        DateTime=2017-07-11T21:37:44.1742518Z
    SignalR.ScaleoutMessageBus Verbose: 0 : Received message 15198583: '{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[43559,"cb15e712-8302-4523-93a4-4eeb27602354"]}' over ScaleoutMessageBus
        DateTime=2017-07-11T21:37:44.1742518Z
    SignalR.ScaleoutMessageBus Verbose: 0 : Message id: 15198583, stream : 0, eventKey: 'hg-RealTimeInboxHub.SubscriptionListenerGroup' saved with local id: 25
        DateTime=2017-07-11T21:37:44.1742518Z
    SignalR.ScaleoutMessageBus Verbose: 0 : Scheduling eventkeys: hg-RealTimeInboxHub.SubscriptionListenerGroup
        DateTime=2017-07-11T21:37:44.1742518Z
    SignalR.SqlMessageBus Verbose: 0 : Stream 0 : SqlReceiver last payload ID=15198583, new payload ID=15198584
        DateTime=2017-07-11T21:37:44.1742518Z
    

    Only Sending to the new instance of the tester app in the SubscriptionListenerGroup group (I don’t know why the other 3 didn’t get this message)

    SignalR.Transports.ServerSentEventsTransport Verbose: 0 : Sending outgoing message. Connection id: b1565993-1dba-42d4-87b1-9b95e56a88f8, transport: ServerSentEventsTransport, message: data: {"C":"s-0,E7E977","M":[{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[43559,"cb15e712-8302-4523-93a4-4eeb27602354"]}]}
        DateTime=2017-07-11T21:37:44.1898518Z
    

    New instance of Tester App - equivalent of WCF Get the message, the other one doesn't. 

    21:37:44.1898518 - b1565993-1dba-42d4-87b1-9b95e56a88f8 - SSE: OnMessage(Data: {"C":"s-0,E7E977","M":[{"H":"RealTimeInboxHub","M":"SubscribeUser","A":[43559,"cb15e712-8302-4523-93a4-4eeb27602354"]}]})
    


    Also note:
    If I the original Tester app tries to rejoin the group it also gets the same exception as above, but this doesnt seem to help at all.   I dont see the same kind of event that suggest it actually tried to join the group in the logs.

    Also if I recycle the website, the clients then start receiving messages again. 

    Tuesday, July 11, 2017 10:20 PM
  • User1771544211 posted

    Hi jakeb16,

    Based on your description, the SignalR can't send outgoing message to the old tester App after a period of time.

    I guess there are some conditions can lead to this issue.

    1. The old tester Apps are shut down or inactive after a period time.

    2. The old tester Apps need to authorize the SignalR request. But after a period of time the authorized state is lost.

    And you can try enable the SignalR tracing to view diagnostic information about events in your SignalR application.

    https://docs.microsoft.com/en-us/aspnet/signalr/overview/testing-and-debugging/enabling-signalr-tracing

    Best Regards,

    Jean

    Thursday, July 13, 2017 7:28 AM
  • User1152062383 posted

    Hi Jean, just got back from vacation (sorry for the delay).

    Thanks for your help.

    As you can see above I did turn on tracing and copied the logs above.

    What do you mean by inactive after a period of time (will this matter?) I can see the keep alive messages but there may be periods where neither the server or client is invoking methods on each other .

    Its definitely not shutdown as its a winform I can see.

     As for authorization I would have expected to see errors in the logs on the server side that it cant connect to the client (old tester app)

    Additionally if it was a authorization error wouldn't a close type event occur, since it looses communication? (which is not going on) 

     -Jacob

    Monday, July 24, 2017 4:17 PM
  • User1771544211 posted

    Hi jakeb16,

    What do you mean by inactive after a period of time (will this matter?) I can see the keep alive messages but there may be periods where neither the server or client is invoking methods on each other .

    In your previous post "Only Sending to the new instance of the tester app in the SubscriptionListenerGroup group (I don’t know why the other 3 didn’t get this message)", you mention that only the new instance of tester app works, the old apps not work. I'm not sure what's happening here, but the root cause may occur during this stage.

    As for authorization I would have expected to see errors in the logs on the server side that it cant connect to the client (old tester app)

    Additionally if it was a authorization error wouldn't a close type event occur, since it looses communication? (which is not going on)

    Just some thoughts, I'm not sure about the root cause of your issue.

    Your issue is strange and to condition is complex, it's hard to find the reason and solution. So we should image of any possible causes and then verify these possible causes.

    Best Regards,

    Jean

    Tuesday, July 25, 2017 5:16 AM
  • User1152062383 posted

    Thanks for clarifying and you are trying! However I really am no closer to figuring this out. :( At this point what I did was stopped using signalR for communication between the Web site and WCF services. And just used WCF Calls. The downside is now the website has to keep track of each node in the cluster so it can call a WCF method on each node, instead of relying on the backplane.

    If anyone knows of a way that group membership could be lost after a period of time that might help me out.

    Tuesday, July 25, 2017 4:51 PM
  • User1152062383 posted

    So we just inserted this change, and now we are seeing something very similar from the web server to the browsers. The browsers can send messages and they are received by the website, but the web site is not sending any messages back to the browser (after a while, maybe a hour)

    Tuesday, July 25, 2017 9:39 PM
  • User991499041 posted

    Hi jakeb16,

    When you add a user to a group using the Groups.Add method, the user receives messages directed to that group for the duration of the current connection, but the user's membership in that group is not persisted beyond the current connection. If you want to permanently retain information about groups and group membership, you must store that data in a repository such as a database or Azure table storage. Then, each time a user connects to your application, you retrieve from the repository which groups the user belongs to, and manually add that user to those groups.When reconnecting after a temporary disruption, the user automatically re-joins the previously-assigned groups. Automatically rejoining a group only applies when reconnecting, not when establishing a new connection.

    Regards,

    zxj

    Thursday, August 10, 2017 7:56 AM