locked
Long-running RRS feed

  • Question

  • Currently we are evaluating Windows Server AppFabric and Workflow Services. Our scenario is very simple. We have long-running processes implemented as workflow services and hosted in AppFabric.  The idea is to allow client to activate many long running processes (workflows) asynchronously and let the system process them. In addition we need to provide reliable processing. Our architecture is simple.

    -          To guaranty message delivery and allow async communication we use MSMQ binding for our services

    -          We use WCF routing with protocol bridging (HTTP -> MSMQ) for clients that don't support MSMQ

    To evaluate this scenario we created the simple workflow, like below

    “Long-running process” activity is a simple SleepActivity code activity with Thread.Sleep(xxx) inside. Client make 1000 calls to service very fast, Server creates corresponding messages in MSMQ and starts processing.

    If “Long-running process” is fast (10-20 ms) everything is ok and system is able to process all messages/workflows successfully without any errors.  But if process takes seconds or minutes problems started. System activates workflow services in a sequence, but when the “Max number of instances” (Throttling settings) is reached runtime still process MSMQ messages and call the service. As result there are many errors like in the log

    System.TimeoutException: The operation did not complete within the allotted timeout of 00:00:30. The time allotted to this operation may have been a portion of a longer timeout.

    We tried to use TransactionRecieveScope activity, but result is mostly the same. After several tries MSMQ messages are moved to the poison folder:

     

    System.ServiceModel.MsmqPoisonMessageException: The transport channel detected a poison message. This occurred because the message exceeded the maximum number of delivery attempts or because the channel detected a fundamental problem with the message. The inner exception may contain additional information.

    Are there any ways to make it works correctly? Or probably it’s not a good idea at all to use MSMQ with AppFabric for such processing.

    Thank you

    Alex

    Friday, October 28, 2011 2:43 PM

All replies

  • Some additional information regarding the problem. In a meantime I test my scenario without any Routing (just Client -> Service communication with MSMQ binding). The problem is the same - many Timeout exceptions.

    Also I tried to set maxConcurrentInstances to unlimited, but I get many stange errors like

     System.ServiceModel.FaultException: The execution of the InstancePersistenceCommand named {urn:schemas-microsoft-com:System.Activities.Persistence/command}LoadWorkflow was interrupted by an error.
       at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
       at System.ServiceModel.Activities.Dispatcher.ControlOperationInvoker.ControlOperationAsyncResult.End(Object[]& outputs, IAsyncResult result)
       at System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeEnd(MessageRpc& rpc)
       at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage7(MessageRpc& rpc)
       at System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean isOperationContextSet)

    or

    System.ServiceModel.FaultException: The SqlWorkflowInstanceStore lock has expired. This could have occurred because the SQL Server is busy or because the connection was temporarily lost.
       at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
       at System.ServiceModel.Activities.Dispatcher.ControlOperationInvoker.ControlOperationAsyncResult.End(Object[]& outputs, IAsyncResult result)
       at System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeEnd(MessageRpc& rpc)
       at System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage7(MessageRpc& rpc)
       at System.ServiceModel.Dispatcher.MessageRpc.Process(Boolean isOperationContextSet)

     

     

    Monday, October 31, 2011 3:45 PM
  • Some update on the topic.

     

    1. Timeout errors are gone. The problem was that maxConcurrentInstances was less than maxConcurrentCalls. In this situation WCF (Net.Msmq Adapter) reads messages from MSMQ and "pushes" them to the service, but number of concurrent instances is exceded and all additional requests to the service failed because of Timeout.

    To avoid it maxConcurrentInstances should be greater than maxConcurrentCalls. WCF (Net.Msmq Adapter) still “pushes” messages to the service, but instead of Timeout errors we get many Throttle Hits. Not nice and no idea how to avoid it, but it’s not critical.

    2. Processing is still unstable. In our test lab ~50 of 500 workflows always failed because of strange errors like

    An error processing the current work item has caused the workflow to abort.  See the inner exception for details. InnerException Message: The execution of the InstancePersistenceCommand named {urn:schemas-microsoft-com:System.Activities.Persistence/command}SaveWorkflow was interrupted by an error.

    In the Microsoft-Windows-Application Server-System Services/Admin log I see the following error:

    Recycling owner of instance store 'defaultSqlPersistenceStore' (Root) because store cancelled operation.\rException: The execution of InstancePersistenceCommands has been canceled because the InstanceHandle was freed. ().

    On the internet you could find many articles like “Create a Durable and Reliable WCF Service with MSMQ 4.0” or “Fronting long-running WF Services with MSMQ, the right way”, but it looks like it’s NOT POSSIBLE to create a really stable solution with it.

    Again our test scenario is very simple. We activate many long-running workflows using MSMQ binding and let the system process them. Our workflows just sleep for a 100 sec to emulate the long-running process. But even in this simple scenario it DOESN’T work.

    Wednesday, November 16, 2011 10:48 AM
  • Hello,

    I am using AppFabric to run long running workflows.

    My workflow has Start method and then 2 delay activities each run in next 5 min consequtively

    When I created 25 workflow instances and called Start method on each.. the delays are not executing for all the workflow instances.

    I have configured throttling as below:

    <serviceThrottlingmaxConcurrentCalls="5"maxConcurrentSessions="100"maxConcurrentInstances="20"/>

    But logs could not be seen after delay for each workflow instance.

    I have configured Workflow Persistence Store.

    Any idea..what is wrong here?

    Friday, October 19, 2012 1:46 PM