none
Worker role becomes unresponsive after internal tcp connection with web role

    Question

  • Hi,

    I followed the great tutorial by Ryan and Steve on channel9 on how to create an internal tcp connection between a web and worker role.

    This works fine on the live environment for a couple of requests (with a small delay between requests), however if I make a number of requests in quick succession the worker role becomes unresponsive and never recovers (I thought Azure should recycle it?).

    All requests and processing of requests are wrapped in try..catch however an exception is never caught.
    Looking at the IntelliTrace file the following things happen when it fails:

    System.ServiceModel.CommunicationException  -  The socket connection was aborted.....
    System.InvalidProgramException  -  Common Language Runtime detected an invalid program
    System.Runtime.CallbackException  -  Async Callback threw an exception

    Here is the code that creates the service host:

         var baseAddress = new Uri(String.Format("net.tcp://{0}", iPEndPoint));
    
         searchHost = new ServiceHost(typeof(SearchService), baseAddress);
    
         searchHost.AddServiceEndpoint(typeof(ISearchService), new NetTcpBinding(SecurityMode.None), "searchIn");
    
         searchHost.Open();
    
    

    I'm happy to send you the IntelliTraceFile if required.

    Thanks

    Wednesday, July 07, 2010 8:35 AM

Answers

  • Hi Allen,

    I have got it working, but I'm not quite sure how yet. I already had the DefaultConnectionLimit set to 50, and the listenBacklog to 100.

    I've changed a few things:

    Set the dll ServiceModel to copy local = true.

    Added the attribute to the interface and proxy: [System.CodeDom.Compiler.GeneratedCodeAttribute("System.ServiceModel", "4.0.0.0")]

    I also added some logic on the web role to ensure that a request wasn't made to the worker role with a null or empty search string.

    I will try and find out what fixed and update this thread, to help anyone else in the same situation in the future.

    Again thanks for your help Allen, it's much appreciated!

    Ross

    Friday, July 09, 2010 12:17 PM

All replies

  • Hi,

    It looks like the exception is thrown in the callback method, which runs in another thread so cannot be caught by your code. How about using WCF NetTcpBinding instead of socket directly?

    http://msdn.microsoft.com/en-us/library/ms752250.aspx

    It'll be easier and robust to use WCF.


    Please remember to mark the replies as answers if they help and unmark them if they provide no help. Windows Azure Platform China Blog: http://blogs.msdn.com/azchina/default.aspx
    Thursday, July 08, 2010 7:41 AM
  • Hi Allen,

    Thanks for the link, I tried this, however I still get the same problem. It works for a few individual requests, and then fails when multiple requests are made at the same time.

    I used the WCF config settings exactly as show in the example and adjusted my code. Am I missing something  fundamental about how worker roles can deal with multiple requests on a internal tcp socket?

     

    Thanks

     

    Ross

    Thursday, July 08, 2010 12:44 PM
  • Hi Ross,

    >fails when multiple requests are made at the same time.

    What exception do you get? What happens if you test it locally? There're several settings in WCF that may cause the behavior you mentioned, such as:

    http://social.msdn.microsoft.com/Forums/en/wcf/thread/ca014e42-c547-4745-8b15-ec4d5480211a

    >Am I missing something  fundamental about how worker roles can deal with multiple requests on a internal tcp socket?

    Just in case, please check out whether you've changed this setting:

     public override bool OnStart()
    {
        // Set the maximum number of concurrent connections
        ServicePointManager.DefaultConnectionLimit = 12;

     ...

    }


    Please remember to mark the replies as answers if they help and unmark them if they provide no help. Windows Azure Platform China Blog: http://blogs.msdn.com/azchina/default.aspx
    Friday, July 09, 2010 1:31 AM
  • Hi Allen,

    I have got it working, but I'm not quite sure how yet. I already had the DefaultConnectionLimit set to 50, and the listenBacklog to 100.

    I've changed a few things:

    Set the dll ServiceModel to copy local = true.

    Added the attribute to the interface and proxy: [System.CodeDom.Compiler.GeneratedCodeAttribute("System.ServiceModel", "4.0.0.0")]

    I also added some logic on the web role to ensure that a request wasn't made to the worker role with a null or empty search string.

    I will try and find out what fixed and update this thread, to help anyone else in the same situation in the future.

    Again thanks for your help Allen, it's much appreciated!

    Ross

    Friday, July 09, 2010 12:17 PM
  • Hi Ross,

    We are currently experiencing the exact same symptoms with our inter-role communication, did you ever determine the root cause?

    Thanks,

    Mark

    Thursday, August 26, 2010 11:12 AM
  • We are seeing the exact same issue as of upgrading to Visual Studio 2010, .NET 4.0. Here's the stack trace:

     

    Unhandled exception: System.Runtime.CallbackException: Async Callback threw an exception. ---> System.InvalidProgramException: Common Language Runtime detected an invalid program.
       at System.ServiceModel.Dispatcher.ErrorBehavior.HandleErrorCommon(Exception error, ErrorHandlerFaultInfo& faultInfo)
       at System.ServiceModel.Dispatcher.ChannelDispatcher.HandleError(Exception error, ErrorHandlerFaultInfo& faultInfo)
       at System.ServiceModel.Dispatcher.ChannelDispatcher.HandleError(Exception error)
       at System.ServiceModel.Dispatcher.ErrorHandlingReceiver.EndTryReceive(IAsyncResult result, RequestContext& requestContext)
       at System.ServiceModel.Dispatcher.ChannelHandler.EndTryReceive(IAsyncResult result, RequestContext& requestContext)
       at System.ServiceModel.Dispatcher.ChannelHandler.AsyncMessagePump(IAsyncResult result)
       at System.Runtime.Fx.AsyncThunk.UnhandledExceptionFrame(IAsyncResult result)
       at System.Runtime.AsyncResult.Complete(Boolean completedSynchronously)
       --- End of inner exception stack trace ---
       at System.Runtime.AsyncResult.Complete(Boolean completedSynchronously)
       at System.Runtime.AsyncResult.Complete(Boolean completedSynchronously, Exception exception)
       at System.ServiceModel.Channels.FramingDuplexSessionChannel.TryReceiveAsyncResult.OnReceive(IAsyncResult result)
       at System.Runtime.Fx.AsyncThunk.UnhandledExceptionFrame(IAsyncResult result)
       at System.Runtime.AsyncResult.Complete(Boolean completedSynchronously)
       at System.Runtime.AsyncResult.Complete(Boolean completedSynchronously, Exception exception)
       at System.ServiceModel.Channels.SynchronizedMessageSource.ReceiveAsyncResult.OnReceiveComplete(Object state)
       at System.ServiceModel.Channels.SessionConnectionReader.OnAsyncReadComplete(Object state)
       at System.ServiceModel.Channels.SocketConnection.AsyncReadCallback(Boolean haveResult, Int32 error, Int32 bytesRead)
       at System.Runtime.Fx.IOCompletionThunk.UnhandledExceptionFrame(UInt32 error, UInt32 bytesRead, NativeOverlapped* nativeOverlapped)
       at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pOVERLAP)

    Thursday, August 26, 2010 11:50 PM
  • I have the exact same issue, with the same exception call stack.

    Has there been a resolution to this, or do I need to use an MSDN support call?

     

    Thanks,
    David Pfeffer

    Monday, September 27, 2010 3:10 AM
  • Hi,

    We have two separate solutions that are WCF services over TCP targeting .NET 4. Both of them have suddenly developed this problem some time in the past couple of weeks. They deploy, and when you try to call one or add a service reference to a project pointing to it, they crash and burn. Using Intellitrace, it shows all of the wait handle exception stuff, but at some point, it says it can't find the System.Xml.XmlSerializer, and I don't remember what else.

    It seems odd that we have two of these (one of which has not been changed, the other has been changed to .NET 4) that suddenly won't run in the cloud any more. Both run fine in the dev fabric.

    My coworker filed an issue with Microsoft support, and they asked him when it started happening, which is making us a bit suspicious that something has changed on the Azure side. I'll report back if/when we get an answer.

    RobinDotNet


    Click here to visit my ClickOnce blog!
    Microsoft MVP, Client App Dev
    Tuesday, September 28, 2010 1:25 AM