none
WF4 WCF Send Message at wrong time

    Question

  • Hello, I am hosting a workflow service (xamlx) in IIS. It has some Receive Activities, e.g. MethodA and MethodB. I wrote a MVC pplication as client to call these methods. In PageA, user submits the form will call MethodA, the workflow would goes to the Receive Activity that waiting for MethodB. Then in Page B, user submits the form will call MethodB. However, if user submits in PageA and then go back to PageA and submit again for the same workflow instance, it will wait a minute and give a timeout exception:

    The request channel timed out while waiting for a reply after 00:01:00. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout.

    This error seems come from WCF while I suppose it will give the following error:

    The execution of an InstancePersistenceCommand was interrupted because the instance key 'guid' was not associated to an instance. This can occur because the instance or key has been cleaned up, or because the key is invalid. The key may be invalid if the message it was generated from was sent at the wrong time or contained incorrect correlation data.

    I have a few questions:

    1. Is there any configuration we can set so that another exception can be caught instead of waiting for some time until a timeout exception can be caught? I know we can set a smaller timeout value in binding tag but it shouldn't be a solution.

    2. Is there any way to avoid PageA to be shown when workflow instance is not in a correct state? (Even this is done, we also need to solve problem 1 as the user could open PageA and idle for some time before submit)

    Thanks.

    Wednesday, February 06, 2013 6:19 AM

All replies

  • Hi,

    I am trying to involve someone familiar with this topic to further look at this issue. There might be some time delay. Appreciate your patience.

    Thanks.


    Chen Yu
    MSDN Community Support | Feedback to us
    Develop and promote your apps in Windows Store
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Friday, February 08, 2013 9:04 AM
    Moderator
  • I have a very similar problem with some of my workflows. A second call to the operation that creates the workflow (CanCreateInstance = true) causes a timeout. I'm pretty sure it's caused by the correlation settings, but I've been unable to figure out the right incantations to get it to work for certain workflows.

    Interestingly, I cannot reproduce the problem in a simple sequential workflow with two operations, but as soon as I make the second step in the sequence a more complex activity (eg a Pick or a StateMachine with embedded Receive/SendReply pairs), I get the timeouts on a second call to the initiating operation for the same correlated identity.

    I can provide a self-contained test project that exhibits the behavior if anyone would care to take a look.

    NOTE: I can get it to fail the same way with or without Sql persistence enabled, so I don't think it's a DB locking issue.

    UPDATE: It seems the problem is not limited to the initiating operation - a call to any "out-of-order" operation with a non-trivial, non-sequential workflow causes "TimeoutException" after a minute, instead of an immediate "not available at this time" error.



    • Edited by wbradney Wednesday, May 22, 2013 6:41 PM
    Wednesday, May 22, 2013 3:18 PM
  • After receiving MethodA, you state that your workflow moves on to the Receive for MethodB. When the WorkflowServiceHost receives a second MethodA request, it holds on to it, waiting for a subsequent Receive for MethodA to execute. But when that doesn't happen within the configured timeout period for WCF, the second request for MethodA times out.

    If you think users will invoke either MethodA or MethodB on the service, you should consider putting a Receive for both methods inside a Parallel activity with a CompletionCondition property that evaluates to "true". Then when either method is "invoked", the "other" Receive activity will be cancelled.

    But that might not be what you want. If you truly want the user to do MethodA, then MethodB, you will need to do something in your client to ensure that.

    Jim

    Thursday, May 23, 2013 10:28 PM
    Moderator
  • Jim,

    In my case, I have two operations that should be called in order (OperationA, followed by OperationB). OperationA is CanCreateInstance = true, and OperationB is CanCreateInstance = false.

    I have the two operations in a simple sequence inside the workflow.

    If, for a particular correlated instance, a client calls OperationB first, I immediately get an error to the effect of "No such workflow instance", which is what I'd expect.

    If, for a particular correlated instance, a client calls OperationA, and another client calls OperationA (a scenario which may be quite likely to occur with content-based correlation), I immediately get an error to the effect of "That operation is not available at this time", which is also what I'd expect.

    So far, so good.

    Now, I change the workflow to make it more complicated. OperationB now no longer simply follows OperationA in a simple Sequence, but instead it's inside a Pick activity or a StateMachine activity.

    With this workflow:

    If, for a particular correlated instance, a client calls OperationB first, I immediately get an error to the effect of "No such workflow instance", which is what I'd expect.

    If, for a particular correlated instance, a client calls OperationA, and another client calls OperationA (a scenario which may be quite likely to occur with content-based correlation), I get a TimeoutException after one minute (or whatever my client binding is configured for), which is certainly not what I'd expect. I'd expect the same behaviour as with a simple Sequence (ie an immediate error indicating and out-of-order operation), and I'm not sure why I would expect anything else. It's likely that the OP has a workflow that's more complex than a simple Sequence, as in my second case.

    I can send a standalone project that highlights the issue, if you like.


    • Edited by wbradney Friday, May 24, 2013 12:05 AM
    Friday, May 24, 2013 12:03 AM
  • I am trying to reproduce your issue and have a question about your "more complicated" workflow.

    For your Pick actiivty, does it have the Receive activities for both OperationA and OperationB in it? Or is OperationA first in a sequence, followed by a Pick with OperationB in it? If it is the latter case, what is in the other branches of the Pick, besides OperationB's Receive?

    I probably don't need the entire project, just the XAML file for the service.

    Thanks.

    Jim

    Friday, May 24, 2013 5:11 PM
    Moderator
  • Jim,

    It's:

    Sequence

        Sequence

            Receive(OperationA)

            SendReplyToReceive

        Pick

            PickBranch

                Sequence

                    Receive(OperationB)

                    SendReplyToReceive

    Just remove the Pick and the PickBranch to get to the simpler workflow that works as I'd expect.

    This workflow is obviously a pathological case I concocted to exhibit the problem. It still fails if there's a second PickBranch that triggers on a third operation (OperationC).

    I've also tried it with a StateMachine (which is my actual real-world use-case) in place of the Pick, where OperationB is the trigger for a transition from the initial state (and there are other operations for other transitions from that state). That gives a TimeoutException also.

    <WorkflowService mc:Ignorable="sap sap2010 sads" p1:TextExpression.Namespaces="{x:Reference __ReferenceID2}" p1:TextExpression.References="{x:Reference __ReferenceID3}" ConfigurationName="Service2" sap2010:ExpressionActivityEditor.ExpressionActivityEditor="C#" sap2010:WorkflowViewState.IdRef="WorkflowService_1" Name="Service2" xmlns="http://schemas.microsoft.com/netfx/2009/xaml/servicemodel" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:p="http://tempuri.org/" xmlns:p1="http://schemas.microsoft.com/netfx/2009/xaml/activities" xmlns:sads="http://schemas.microsoft.com/netfx/2010/xaml/activities/debugger" xmlns:sap="http://schemas.microsoft.com/netfx/2009/xaml/activities/presentation" xmlns:sap2010="http://schemas.microsoft.com/netfx/2010/xaml/activities/presentation" xmlns:scg="clr-namespace:System.Collections.Generic;assembly=mscorlib" xmlns:sco="clr-namespace:System.Collections.ObjectModel;assembly=mscorlib" xmlns:ssx="clr-namespace:System.ServiceModel.XamlIntegration;assembly=System.ServiceModel" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml">  <p1:Sequence DisplayName="PickWorkflow" sap2010:WorkflowViewState.IdRef="Sequence_17">    <p1:TextExpression.Namespaces>      <sco:Collection x:TypeArguments="x:String" x:Name="__ReferenceID2">        <x:String>System</x:String>        <x:String>System.Collections.Generic</x:String>        <x:String>System.Data</x:String>        <x:String>System.Linq</x:String>        <x:String>System.Text</x:String>      </sco:Collection>    </p1:TextExpression.Namespaces>    <p1:TextExpression.References>      <sco:Collection x:TypeArguments="p1:AssemblyReference" x:Name="__ReferenceID3">        <p1:AssemblyReference>Microsoft.CSharp</p1:AssemblyReference>        <p1:AssemblyReference>System.Web.DynamicData</p1:AssemblyReference>        <p1:AssemblyReference>System.Drawing</p1:AssemblyReference>        <p1:AssemblyReference>System.Web.Entity</p1:AssemblyReference>        <p1:AssemblyReference>System.Web.ApplicationServices</p1:AssemblyReference>        <p1:AssemblyReference>System</p1:AssemblyReference>        <p1:AssemblyReference>System.Activities</p1:AssemblyReference>        <p1:AssemblyReference>System.Core</p1:AssemblyReference>        <p1:AssemblyReference>System.Data</p1:AssemblyReference>        <p1:AssemblyReference>System.Data.Entity</p1:AssemblyReference>        <p1:AssemblyReference>System.Runtime.Serialization</p1:AssemblyReference>        <p1:AssemblyReference>System.ServiceModel</p1:AssemblyReference>        <p1:AssemblyReference>System.ServiceModel.Activities</p1:AssemblyReference>        <p1:AssemblyReference>System.ServiceModel.Channels</p1:AssemblyReference>        <p1:AssemblyReference>System.Web</p1:AssemblyReference>        <p1:AssemblyReference>System.Xaml</p1:AssemblyReference>        <p1:AssemblyReference>System.Xml</p1:AssemblyReference>        <p1:AssemblyReference>System.Xml.Linq</p1:AssemblyReference>        <p1:AssemblyReference>mscorlib</p1:AssemblyReference>        <p1:AssemblyReference>CorrelationTestService</p1:AssemblyReference>      </sco:Collection>    </p1:TextExpression.References>    <p1:Sequence.Variables>      <p1:Variable x:TypeArguments="CorrelationHandle" Name="__handle1" />    </p1:Sequence.Variables>    <p1:Sequence sap2010:WorkflowViewState.IdRef="Sequence_18">      <p1:TextExpression.Namespaces>        <sco:Collection x:TypeArguments="x:String">          <x:String>System</x:String>          <x:String>System.Collections.Generic</x:String>          <x:String>System.Data</x:String>          <x:String>System.Linq</x:String>          <x:String>System.Text</x:String>        </sco:Collection>      </p1:TextExpression.Namespaces>      <p1:TextExpression.References>        <sco:Collection x:TypeArguments="p1:AssemblyReference">          <p1:AssemblyReference>Microsoft.CSharp</p1:AssemblyReference>          <p1:AssemblyReference>System.Web.DynamicData</p1:AssemblyReference>          <p1:AssemblyReference>System.Drawing</p1:AssemblyReference>          <p1:AssemblyReference>System.Web.Entity</p1:AssemblyReference>          <p1:AssemblyReference>System.Web.ApplicationServices</p1:AssemblyReference>          <p1:AssemblyReference>System</p1:AssemblyReference>          <p1:AssemblyReference>System.Activities</p1:AssemblyReference>          <p1:AssemblyReference>System.Core</p1:AssemblyReference>          <p1:AssemblyReference>System.Data</p1:AssemblyReference>          <p1:AssemblyReference>System.Data.Entity</p1:AssemblyReference>          <p1:AssemblyReference>System.Runtime.Serialization</p1:AssemblyReference>          <p1:AssemblyReference>System.ServiceModel</p1:AssemblyReference>          <p1:AssemblyReference>System.ServiceModel.Activities</p1:AssemblyReference>          <p1:AssemblyReference>System.ServiceModel.Channels</p1:AssemblyReference>          <p1:AssemblyReference>System.Web</p1:AssemblyReference>          <p1:AssemblyReference>System.Xaml</p1:AssemblyReference>          <p1:AssemblyReference>System.Xml</p1:AssemblyReference>          <p1:AssemblyReference>System.Xml.Linq</p1:AssemblyReference>          <p1:AssemblyReference>mscorlib</p1:AssemblyReference>          <p1:AssemblyReference>CorrelationTestService</p1:AssemblyReference>        </sco:Collection>      </p1:TextExpression.References>      <Receive x:Name="__ReferenceID0" CanCreateInstance="True" sap2010:WorkflowViewState.IdRef="Receive_14" OperationName="OperationA" ServiceContractName="p:IService">        <Receive.CorrelatesOn>          <XPathMessageQuery x:Key="key1">            <XPathMessageQuery.Namespaces>              <ssx:XPathMessageContextMarkup>                <x:String x:Key="xgSc">http://tempuri.org/</x:String>              </ssx:XPathMessageContextMarkup>            </XPathMessageQuery.Namespaces>sm:body()/xgSc:OperationA/xgSc:id</XPathMessageQuery>        </Receive.CorrelatesOn>        <ReceiveParametersContent>          <p1:OutArgument x:TypeArguments="x:Int32" x:Key="id" />          <p1:OutArgument x:TypeArguments="x:String" x:Key="value" />        </ReceiveParametersContent>      </Receive>      <SendReply Request="{x:Reference __ReferenceID0}" DisplayName="SendReplyToReceive" sap2010:WorkflowViewState.IdRef="SendReply_14" />    </p1:Sequence>    <p1:Pick sap2010:WorkflowViewState.IdRef="Pick_6">      <p1:PickBranch DisplayName="Branch1" sap2010:WorkflowViewState.IdRef="PickBranch_11">        <p1:PickBranch.Trigger>          <p1:Sequence sap2010:WorkflowViewState.IdRef="Sequence_19">            <Receive x:Name="__ReferenceID1" sap2010:WorkflowViewState.IdRef="Receive_15" OperationName="OperationB" ServiceContractName="p:IService">              <Receive.CorrelatesOn>                <XPathMessageQuery x:Key="key1">                  <XPathMessageQuery.Namespaces>                    <ssx:XPathMessageContextMarkup>                      <x:String x:Key="xgSc">http://tempuri.org/</x:String>                    </ssx:XPathMessageContextMarkup>                  </XPathMessageQuery.Namespaces>sm:body()/xgSc:OperationB/xgSc:id</XPathMessageQuery>              </Receive.CorrelatesOn>              <ReceiveParametersContent>                <p1:OutArgument x:TypeArguments="x:Int32" x:Key="id" />              </ReceiveParametersContent>            </Receive>            <SendReply Request="{x:Reference __ReferenceID1}" DisplayName="SendReplyToReceive" sap2010:WorkflowViewState.IdRef="SendReply_15" />          </p1:Sequence>        </p1:PickBranch.Trigger>      </p1:PickBranch>    </p1:Pick>    <sads:DebugSymbol.Symbol>d21DOlxVc2Vyc1x3YnJhZG5leVxEb2N1bWVudHNcVmlzdWFsIFN0dWRpbyAyMDEyXFByb2plY3RzXENvcnJlbGF0aW9uVGVzdFxDb3JyZWxhdGlvblRlc3RTZXJ2aWNlXFNlcnZpY2UyLnhhbWx4CQ0DexEBATIFYxMBB2QFeQ8BAlQHYREBCWIHYosBAQhlB3gXAQNnC3YZAQRoDXQXAQZ1DXWRAQEF</sads:DebugSymbol.Symbol>  </p1:Sequence>  <sap2010:WorkflowViewState.ViewStateManager>    <sap2010:ViewStateManager>      <sap2010:ViewStateData Id="Receive_14" sap:VirtualizedContainerService.HintSize="255,90" />      <sap2010:ViewStateData Id="SendReply_14" sap:VirtualizedContainerService.HintSize="255,90" />      <sap2010:ViewStateData Id="Sequence_18" sap:VirtualizedContainerService.HintSize="421,344">        <sap:WorkflowViewStateService.ViewState>          <scg:Dictionary x:TypeArguments="x:String, x:Object">            <x:Boolean x:Key="IsExpanded">True</x:Boolean>          </scg:Dictionary>        </sap:WorkflowViewStateService.ViewState>      </sap2010:ViewStateData>      <sap2010:ViewStateData Id="Receive_15" sap:VirtualizedContainerService.HintSize="255,90" />      <sap2010:ViewStateData Id="SendReply_15" sap:VirtualizedContainerService.HintSize="255,90" />      <sap2010:ViewStateData Id="Sequence_19" sap:VirtualizedContainerService.HintSize="277,344">        <sap:WorkflowViewStateService.ViewState>          <scg:Dictionary x:TypeArguments="x:String, x:Object">            <x:Boolean x:Key="IsExpanded">True</x:Boolean>          </scg:Dictionary>        </sap:WorkflowViewStateService.ViewState>      </sap2010:ViewStateData>      <sap2010:ViewStateData Id="PickBranch_11" sap:VirtualizedContainerService.HintSize="307,602" />      <sap2010:ViewStateData Id="Pick_6" sap:VirtualizedContainerService.HintSize="421,648" />      <sap2010:ViewStateData Id="Sequence_17" sap:VirtualizedContainerService.HintSize="443,1156">        <sap:WorkflowViewStateService.ViewState>          <scg:Dictionary x:TypeArguments="x:String, x:Object">            <x:Boolean x:Key="IsExpanded">True</x:Boolean>          </scg:Dictionary>        </sap:WorkflowViewStateService.ViewState>      </sap2010:ViewStateData>      <sap2010:ViewStateData Id="WorkflowService_1" sap:VirtualizedContainerService.HintSize="473,1226" />    </sap2010:ViewStateManager>  </sap2010:WorkflowViewState.ViewStateManager></WorkflowService>

    Thanks for looking into this - I'm getting ready to make a recommendation on using WF 4.5 for a new project and I'd like to know I'm not missing something fundamental in my assumptions about how the engine works.




    • Edited by wbradney Friday, May 24, 2013 7:26 PM
    Friday, May 24, 2013 7:10 PM
  • Your correlations are indeed not correct.

    In fact, I am a bit confused by the XAML you included. I don't see any CorrelationInitializers in it. There should be at least 2, one for each Request/SendReply combination. They each should have a CorrelationHandle that needs to get initialized by a RequestReplyCorrelationInitializer.

    Also, in order to create a correlation between the two operations, there needs to be another CorrelationHandle variable that you define at the outermost Sequence (the sequence that encompasses both Receive activities). Let's name that variable myCorrelationHandle. The Receive for OperationA needs a QueryCorrelationInitializer that intializes myCorrelationHandle with an XPath Query for the "id" parameter.

    In the Receive for OperationB, it should "Correlate On" myCorrelationHandle with an XPath Query on the id parameter. This allows the OperationB with a matching "id" parameter value to be directed to the same workflow instance that got started with an OperationA with that "id" parameter value.

    The Receive for OperationA should NOT have a "CorrelatesOn" property, as it does now. This is the thing that is causing the "timeout" on the request from the second client. Since OperationA is the "CanCreateInstance" Receive in the workflow, it does not need have "Correlates On" set. By removing the CorrelatesOn from OperationA, when a second client tries to invoke OperationA with a matching "id" parameter value while a workflow instance with that same "id" value exists (waiting for OperationB), you will get an exception immediately.

    Here is my XAML with the correlation stuff in bold and italic:

    <WorkflowService mc:Ignorable="sads sap" ConfigurationName="Service1" sap:VirtualizedContainerService.HintSize="495,1250" Name="Service1" mva:VisualBasic.Settings="Assembly references and imported namespaces serialized as XML namespaces"
     xmlns="http://schemas.microsoft.com/netfx/2009/xaml/servicemodel"
     xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
     xmlns:mv="clr-namespace:Microsoft.VisualBasic;assembly=System"
     xmlns:mva="clr-namespace:Microsoft.VisualBasic.Activities;assembly=System.Activities"
     xmlns:p="http://tempuri.org/"
     xmlns:p1="http://schemas.microsoft.com/netfx/2009/xaml/activities"
     xmlns:s="clr-namespace:System;assembly=mscorlib"
     xmlns:s1="clr-namespace:System;assembly=System"
     xmlns:s2="clr-namespace:System;assembly=System.Xml"
     xmlns:s3="clr-namespace:System;assembly=System.Core"
     xmlns:s4="clr-namespace:System;assembly=System.ServiceModel"
     xmlns:sa="clr-namespace:System.Activities;assembly=System.Activities"
     xmlns:sad="clr-namespace:System.Activities.Debugger;assembly=System.Activities"
     xmlns:sads="http://schemas.microsoft.com/netfx/2010/xaml/activities/debugger"
     xmlns:sap="http://schemas.microsoft.com/netfx/2009/xaml/activities/presentation"
     xmlns:scg="clr-namespace:System.Collections.Generic;assembly=System"
     xmlns:scg1="clr-namespace:System.Collections.Generic;assembly=System.ServiceModel"
     xmlns:scg2="clr-namespace:System.Collections.Generic;assembly=System.Core"
     xmlns:scg3="clr-namespace:System.Collections.Generic;assembly=mscorlib"
     xmlns:sd="clr-namespace:System.Data;assembly=System.Data"
     xmlns:sl="clr-namespace:System.Linq;assembly=System.Core"
     xmlns:ssa="clr-namespace:System.ServiceModel.Activities;assembly=System.ServiceModel.Activities"
     xmlns:ssx="clr-namespace:System.ServiceModel.XamlIntegration;assembly=System.ServiceModel"
     xmlns:st="clr-namespace:System.Text;assembly=mscorlib"
     xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml">
      <p1:Sequence DisplayName="Sequential Service" sad:XamlDebuggerXmlReader.FileName="e:\tempprojects\WFServiceWithPick\WFServiceWithPick\Service1.xamlx" sap:VirtualizedContainerService.HintSize="465,1220" mva:VisualBasic.Settings="Assembly references and imported namespaces serialized as XML namespaces">
        <p1:Sequence.Variables>
          <p1:Variable x:TypeArguments="CorrelationHandle" Name="myCorrelationHandle" />
        </p1:Sequence.Variables>
        <sap:WorkflowViewStateService.ViewState>
          <scg3:Dictionary x:TypeArguments="x:String, x:Object">
            <x:Boolean x:Key="IsExpanded">True</x:Boolean>
          </scg3:Dictionary>
        </sap:WorkflowViewStateService.ViewState>
        <p1:Sequence sap:VirtualizedContainerService.HintSize="443,336">
          <p1:Sequence.Variables>
            <x:Reference>__ReferenceID1</x:Reference>
          </p1:Sequence.Variables>
          <sap:WorkflowViewStateService.ViewState>
            <scg3:Dictionary x:TypeArguments="x:String, x:Object">
              <x:Boolean x:Key="IsExpanded">True</x:Boolean>
            </scg3:Dictionary>
          </sap:WorkflowViewStateService.ViewState>
          <Receive x:Name="__ReferenceID0" CanCreateInstance="True" sap:VirtualizedContainerService.HintSize="255,86" OperationName="OperationA" ServiceContractName="p:IService">
            <Receive.CorrelationInitializers>
              <RequestReplyCorrelationInitializer>
                <RequestReplyCorrelationInitializer.CorrelationHandle>
                  <p1:InArgument x:TypeArguments="CorrelationHandle">
                    <p1:VariableValue x:TypeArguments="CorrelationHandle">
                      <p1:VariableValue.Variable>
                        <p1:Variable x:TypeArguments="CorrelationHandle" x:Name="__ReferenceID1" Name="__handle1" />
                      </p1:VariableValue.Variable>
                    </p1:VariableValue>
                  </p1:InArgument>
                </RequestReplyCorrelationInitializer.CorrelationHandle>
              </RequestReplyCorrelationInitializer>
              <QueryCorrelationInitializer CorrelationHandle="[myCorrelationHandle]">
                <XPathMessageQuery x:Key="key1">
                  <XPathMessageQuery.Namespaces>
                    <ssx:XPathMessageContextMarkup>
                      <x:String x:Key="xgSc">http://tempuri.org/</x:String>
                    </ssx:XPathMessageContextMarkup>
                  </XPathMessageQuery.Namespaces>sm:body()/xgSc:OperationA/xgSc:id</XPathMessageQuery>
              </QueryCorrelationInitializer>
            </Receive.CorrelationInitializers>
            <ReceiveParametersContent>
              <p1:OutArgument x:TypeArguments="x:Int32" x:Key="id" />
              <p1:OutArgument x:TypeArguments="x:String" x:Key="value" />
            </ReceiveParametersContent>
          </Receive>
          <SendReply Request="{x:Reference __ReferenceID0}" DisplayName="SendReplyToReceive" sap:VirtualizedContainerService.HintSize="255,86" />
        </p1:Sequence>
        <p1:Pick sap:VirtualizedContainerService.HintSize="443,720">
          <p1:PickBranch DisplayName="Branch1" sap:VirtualizedContainerService.HintSize="329,674">
            <p1:PickBranch.Trigger>
              <p1:Sequence sap:VirtualizedContainerService.HintSize="299,460">
                <sap:WorkflowViewStateService.ViewState>
                  <scg3:Dictionary x:TypeArguments="x:String, x:Object">
                    <x:Boolean x:Key="IsExpanded">True</x:Boolean>
                  </scg3:Dictionary>
                </sap:WorkflowViewStateService.ViewState>
                <p1:Sequence sap:VirtualizedContainerService.HintSize="277,336">
                  <p1:Sequence.Variables>
                    <x:Reference>__ReferenceID3</x:Reference>
                  </p1:Sequence.Variables>
                  <sap:WorkflowViewStateService.ViewState>
                    <scg3:Dictionary x:TypeArguments="x:String, x:Object">
                      <x:Boolean x:Key="IsExpanded">True</x:Boolean>
                    </scg3:Dictionary>
                  </sap:WorkflowViewStateService.ViewState>
                  <Receive x:Name="__ReferenceID2" CorrelatesWith="[myCorrelationHandle]" sap:VirtualizedContainerService.HintSize="255,86" OperationName="OperationB" ServiceContractName="p:IService">
                    <Receive.CorrelatesOn>
                      <XPathMessageQuery x:Key="key1">
                        <XPathMessageQuery.Namespaces>
                          <ssx:XPathMessageContextMarkup>
                            <x:String x:Key="xgSc">http://tempuri.org/</x:String>
                          </ssx:XPathMessageContextMarkup>
                        </XPathMessageQuery.Namespaces>sm:body()/xgSc:OperationB/xgSc:id</XPathMessageQuery>
                    </Receive.CorrelatesOn>
                    <Receive.CorrelationInitializers>
                      <RequestReplyCorrelationInitializer>
                        <RequestReplyCorrelationInitializer.CorrelationHandle>
                          <p1:InArgument x:TypeArguments="CorrelationHandle">
                            <p1:VariableValue x:TypeArguments="CorrelationHandle">
                              <p1:VariableValue.Variable>
                                <p1:Variable x:TypeArguments="CorrelationHandle" x:Name="__ReferenceID3" Name="__handle1" />
                              </p1:VariableValue.Variable>
                            </p1:VariableValue>
                          </p1:InArgument>
                        </RequestReplyCorrelationInitializer.CorrelationHandle>
                      </RequestReplyCorrelationInitializer>
                    </Receive.CorrelationInitializers>
                    <ReceiveParametersContent>
                      <p1:OutArgument x:TypeArguments="x:Int32" x:Key="id" />
                    </ReceiveParametersContent>
                  </Receive>
                  <SendReply Request="{x:Reference __ReferenceID2}" DisplayName="SendReplyToReceive" sap:VirtualizedContainerService.HintSize="255,86" />
                </p1:Sequence>
              </p1:Sequence>
            </p1:PickBranch.Trigger>
          </p1:PickBranch>
        </p1:Pick>
      </p1:Sequence>
    </WorkflowService>

    Friday, May 24, 2013 10:58 PM
    Moderator
  • Jim,

    Thanks. I had correlation configuration similar to those in my first attempts to get this workflow working.

    See my other question here: http://social.msdn.microsoft.com/Forums/en-US/wfprerelease/thread/c3b274d1-a1ab-4ec6-b66a-a2bbc8ac5150

    With those correlations, I don't get a TimeoutException any more. The second call to OperationA fails immediately, but not with an error message that indicates any kind of out-of-order attempt. In fact, the second call also seems to create a second workflow in the persistence store, and then immediately aborts both that workflow and the one created by the first successful call correlated to the same id. Any subsequent call to any operation on that correlated workflow causes: "an error processing the current work item has caused the workflow to abort". The only resolution is to manually delete the broken workflow instances from the persistent store.

    I've been playing around with the correlation settings for a while now (admittedly not fully understanding them), and haven't come close to a robust solution that works for out-of-order calls - except when I removed the CorrelationInitializers and set only CorrelatesOn.

    Note that with these settings the simple sequence works _exactly_ as I would expect.

    Maybe my expectations are wrong -- can I send you my test project and you can tell me where I'm going wrong?


    • Edited by wbradney Saturday, May 25, 2013 1:35 AM
    Saturday, May 25, 2013 12:06 AM
  • Because OperationA is marked as "CanCreateInstance = true", the system attempts to create a second instance of the workflow when it receives the second invocation of OperationA. But because the correlation is being initialized with the same data, namely the data passed by the client, the system encounters a key collision - the attempt to create a row in the KeysTable fails due to the duplicate.

    You are correct that this is causing the first workflow instance to abort, leaving a bad entry in the instance store database.

    What you need to do is ensure that the data you are using to initialize your correlation is unique for each instance. If it is possible for two clients to provide the same data to OperationA, you need to have the reply to OperationA include some sort of data that is unique to that instance, like a generated Guid, or maybe the instance id. Use this unique identifier to initialize the correlation handle in the SendReply, rather than using the client input for the correlation initialization in the Receive for OperationA.

    The subsequent request to OperationB will need to include the unique value returned from OperationA, in addition to any other data the client needs to pass. The "CorrelatesOn" for OperationB needs to specify the unique data that was returned from OperationA.

    I hope this helps.

    Jim

    Tuesday, May 28, 2013 5:03 PM
    Moderator
  • Jim,

    Thanks - I appreciate the help.

    I think I must be misunderstanding the intent of content-based workflow correlation.

    But first:

    Because OperationA is marked as "CanCreateInstance = true", the system attempts to create a second instance of the workflow when it receives the second invocation of OperationA

    With my configuration, that doesn't appear to be the case with a simple sequential workflow. On the second attempt the engine appears to understand that the workflow already created by the first attempt is the appropriate workflow, and, having already accepted OperationA for that workflow, an "operation not available at this time" error is returned (as I would expect), leaving the workflow perfectly intact and ready to accept a call to OperationB (also as I would expect).

    As to your suggestion for generating a GUID in the reply to OperationA, I'll give that a try, but I'm skeptical:

    I have existing entities that are stored in a relational database and have a guaranteed unique id (lets call it EntityId). I want to allow each of these entities to have at most one correlated workflow. I don't want to ever allow there to be more than one active workflow for a given EntityId.

    I offer an ASP.NET application that allows a user to choose an entity, and start a workflow for it. I cannot guarantee that two users will not hit the "OperationA" button at around the same time. All I have with which to correlate a workflow is the EntityId. I had assumed that content-based correlation was the way to go.

    If I must generate a GUID as an additional "unique" correlating identifier, how would a second client invoking OperationA (at or near the same time as the first client) for a particular EntityId ever know that GUID in order to ensure that the right correlated workflow instance was used? Isn't it more likely that a second workflow with a second GUID would be created for the second call (thereby breaking my constraint that there can be only one workflow per EntityId)?

    I suppose I could extend my entity model to capture the GUID in my existing table, update the entity when the first call to OperationA creates a workflow, and do pessimistic checking before calling OperationA to make sure no other client started a workflow before I got the chance, but it feels like the workflow engine (with content-based correlation) should be handling all of that stuff for me. Maybe I'm wrong about that.

    I feel like WF was designed to be able to handle my scenario (I don't think it's particularly complex), and notwithstanding the fact that I'm a) apparently configuring correlation incorrectly, b) I'm getting a TimeoutException where I'd expect a "not available at this time" error and c) this happens ONLY with a slightly more complex, non-sequential workflow, my scenario does indeed work the way I'd expect. But I don't have confidence right now that I can sell it for a mission-critical project, and I think it must be me rather than the technology that's missing something.

    Tuesday, May 28, 2013 9:45 PM
  • Sorry it took me a couple of days to get back to you.

    Well, I am still trying to work thru how to go about doing what you want to do.

    But I do have an explanation for the "first" workflow instance not being able to process OperationB after the second OperationA gets an error in my case where OperationA does NOT have the CorrelatesOn property set.

    The default "TimeToUnload" value for the SqlWorkflowInstanceStoreBehavior is 1 minute. So a workflow instance will be persisted and unloaded 1 minute after it goes idle (i.e. one minute after it starts waiting for OperationB). There is a database row created for the instance right after it gets created, but if you'll notice, there are a number of NULL columns and the "IsInitialized" column is 0. This row is created by SqlWorkflowInstanceStore. The workflow instance itself has not yet persisted any data.

    If you send the second OperationA, which creates a second instance of the workflow, before the first instance has persisted and unloaded, the first instance aborts. Aborting an instance that has not yet done its first persist essentially kills off the instance. If you wait until after the first instance persists and unloads before sending the second OperationA, then OperationB to the first instance would actually be successful because it will find the persisted/unloaded first instance via the correlation key, load it, and invoke OperationB on it.

    I am still trying to work out why you see different behaviors between the simple Sequence of OperationA and OperationB and a sequence with OperationA followed by a Pick with OperationB. Because you have CorrelatesOn and CanCreateInstance set on OperationA, the second invocation of OperationA should try to create a second instance of the workflow and then fail when it tries to create a correlation key because the key already exists. The creation of the correlation key will fail because you are using the same data for the two instances - the data that came from the client.

    Jim

    Monday, June 03, 2013 9:30 PM
    Moderator
  • Oops. Sorry. It is the WorkflowIdleBehavior that has the TimeToUnload property. But I am sure you already know that. You can make TimeToUnload longer and TimeToPersist shorter to get the workflow persisted sooner, but not unloaded/unlocked until later.

    Jim

    Monday, June 03, 2013 9:35 PM
    Moderator
  • I have:

              <workflowIdle timeToPersist="00:00:05"

                            timeToUnload="00:00:10" />

    But I'm not sure these settings are directly relevant to the discussion, because even if I disable all SQL persistence, all my workflow tests seem to behave the exact same way (ie with CorrelatesOn: perfectly for Sequence, almost perfectly for Pick, and with CorrelationInitializers: fail badly). In other words, I think the issues are with the implementation of content-based correlation in the workflow engine rather than with how they are being persisted.

    I actually opened a connect issue on this, with an attached sample test project that shows what I'm expecting for my scenario, and tests with both CorrelatesOn and CorrelationInitializers. Maybe you could take a look and let me know where my expectations are invalid, and/or rectify my workflows where I've configured them incorrectly:

    https://connect.microsoft.com/VisualStudio/feedback/details/788815/timeoutexception-in-wf-4-5-with-content-based-correlation-and-non-sequential-operations#details

    Thanks


    • Edited by wbradney Tuesday, June 04, 2013 12:29 AM
    Monday, June 03, 2013 10:55 PM
  • Because you have CorrelatesOn and CanCreateInstance set on OperationA, the second invocation of OperationA should try to create a second instance of the workflow and then fail when it tries to create a correlation key because the key already exists. The creation of the correlation key will fail because you are using the same data for the two instances - the data that came from the client.

    This statement makes me really nervous that content-based correlation works nothing like I imagine it should, and I can think of no situation where the above behavior would be desired or even useful. It seems to me that the whole point of content-based correlation should be that a second instance should NOT be created for the same correlated input argument, but rather that an error should be returned indicating that the operation is not available at this time (as indeed it does for the simple sequential case).



    • Edited by wbradney Tuesday, June 04, 2013 12:27 AM
    Tuesday, June 04, 2013 12:20 AM
  • The discussion about when the persist happens related to whether or not OperationB would be successful after the workflow abort due to the second OperationA in my usage of CorrelationInitializer.

    Upon further reflection, your usage of CorrelatesOn for the two Receive activities is just fine.

    Now we need to get back to why you see differing behavior between a simple Sequence (OperationA, OperationB) and a workflow with OperationB being inside of a Pick or State.

    I have been working thru the project you sent in with your connect issue and am boiling down the differences in behavior. I will get back to you soon on that.

    Jim

    Tuesday, June 04, 2013 11:37 PM
    Moderator
  • The root cause of the difference in behavior (timeout) when using a Pick or State activity to encapsulate the Receive for OperationB is that Pick and State activities create internal bookmarks that are unrelated to the bookmarks created by Receive activities.

    The bookmarks created by Receive activities (let’s call them “protocol bookmarks”) are treated in a special way in order to preserve the messaging protocol implemented by the workflow service.

    Imagine a scenario where the service has the messaging “protocol” of Operation1 followed by Operation2:

    Receive(Operation1)

    SendReply(Operation1)

    DoSomeOtherWork

    Receive(Operation2)

    SendReply(Operation2)

    If DoSomeOtherWork has asynchronous operations that may cause the workflow instance to go idle, they are going to create non-protocol bookmarks. These bookmarks will get resumed thru some other means, outside of the messaging protocol for the workflow service.

    So it is possible for Operation1 to complete and for the client to send Operation2 before DoSomeOtherWork is complete. We don’t want to reject Operation2 immediately. Instead, we hang onto the Operation2 message in hopes that the work being done by DoSomeOtherWork completes and the bookmark associated with that work gets resumed. Once DoSomeOtherWork completes, the protocol bookmark for Receive(Operation2) will get created and now the message from the client can successfully get processed.

    While DoSomeOtherWork is still outstanding, the message for Operation 2 is received. It discovers that there is no protocol bookmark for Operation2. We then check to see if there are any non-protocol bookmarks outstanding for the instance. If there are (as is the case when DoSomeOtherWork is still outstanding), we hang on to the message. But if there are no other non-protocol bookmarks, we immediately reject the message as being out of order.

    The Pick activity internally creates a non-protocol bookmark that gets resumed when the Trigger completes.

    In your repro scenario, OperationA gets replied to and the Pick starts executing and creates the non-protocol bookmark for the Trigger completion. Then the Receive for OperationB starts executing and creates the protocol bookmark for OperationB. All is well.

    Then the client sends OperationA again with the same parameter. The CorrelatesOn causes us to find the same workflow instance and we try to find a protocol bookmark for OperationA. But there isn’t one. But we then discover that there is a non-protocol bookmark (created by the Pick) outstanding. So we hang on to the second OperationA message. This causes the timeout.

    So let’s see if you can avoid using the Pick activity. You might be able to utilize Parallel with CompletionCondition = true if the functionality you desire is to have only one branch complete. Like if you want to receive one of 3 different messages (OperationB, OperationC, or OperationD), you can use a Parallel with each branch beginning with a Receive of these operations. When you set CompletionCondition=true, you are telling the Parallel to cancel the other branches once one of the branches completes. So once one of the OperationB, OperationC, or OperationD is received, the Parallel would cancel the other two branches and then the Parallel would complete. But that might not be the functionality you are looking for.

    The activities you are going to want to avoid if you want to avoid the Timeout exception for out-of-order messages are Pick, State (because the Transition class has the concept of “trigger”, too) , CompensableActivity (and the related CompensationExtension) Confirm, and Delay. These all create internal bookmarks.

    In the future, we may be re-evaluating the behavior of the workflow runtime when dealing with this scenario of a mixture of protocol and non-protocol bookmarks.

    Jim

    • Proposed as answer by wbradney Thursday, June 06, 2013 1:09 AM
    Wednesday, June 05, 2013 11:46 PM
    Moderator
  • Jim,

    Thanks for the analysis - that clears up a lot for me. [It should be in the documentation!]

    Let's restate to see if I've fully understood the limitation:

    The concept of two types of bookmarks makes sense. I can see why the engine would want to hang onto a Receive for OperationB if we do not have a protocol bookmark for OperationB, but we do have at least one non-protocol bookmark. I guess the problem with this approach is that the engine is not smart enough (yet) to determine that OperationB in this case is actually valid according to the protocol of the workflow -- it's just ruling that OperationB might be valid, but we won't know until there are no non-protocol bookmarks waiting to be resumed. In this case the rule works and the decision is the correct one.

    In my scenario, clearly OperationA cannot be valid according to the protocol of the workflow, since we already received it and it's no longer reachable (i.e. there's no path in the workflow that leads us back there). However, the engine is not currently capable of (heuristically or otherwise) determining that - it's just following the rule that "no OperationA protocol bookmark, but one non-protocol bookmark; therefore we wait". In this case the rule leads to the wrong decision.

    I guess it could be quite complicated to establish the validity of a particular protocol operation in the presence of non-protocol bookmarks, but that's probably what would be necessary in order to resolve this.

    Alternatively, perhaps the behavior of waiting if there are non-protocol bookmarks could be made optional in a future version? I imagine that in my case I'd be quite happy to have OperationB rejected as out-of-order if DoSomeOtherWork was still in progress.

    Unfortunately, my real-world scenario involves a State Machine of arbitrary complexity, with state transitions triggered by protocol messages being received, so I think I'm probably hosed. Having said that, it may not be so bad to live with the operation timeout (perhaps making it shorter) in that one particular scenario, because other than that my workflow is actually working quite nicely.

    Once again, thanks for all the effort you put in to clear this up for me.



    • Edited by wbradney Thursday, June 06, 2013 1:17 AM
    Thursday, June 06, 2013 1:07 AM
  • Jim,

    Revisiting this with the hope of finding a better solution. Your suggestion to use a Parallel instead of a Pick would not (quite) work for me, I think, because it would potentially allow two operations to be invoked simultaneously (which it exactly what I'm trying to prevent).

    Correct me if I'm wrong, but with Pick, branches containing the Receive activities that were NOT called CANNOT execute, and their bookmarks would no longer be active, resulting in a "not available at this time" exception to any client that tried.

    Conversely, a Parallel would only complete at the END of the first branch to complete. So, it would be possible for several service operations to be invoked (and start processing) and for all but one to later be cancelled. For my scenario this might be problematic, since the "partially executed" branches might have performed actions that are inconsistent with actions taken by the branch that first triggered. (In other words, they really can't execute in parallel -- you have to pick one).

    Is there a way to use a Parallel, but somehow "lock" a resource in the workflow such that the first Receive branch could prevent the other Receive branches from actually making progress (or better still throw) after receiving the message but before starting to execute the activities after the Receive?

    Alternatively, is there a way to craft a simpler, "CustomPick" activity that doesn't use internal bookmarks?


    • Edited by wbradney Tuesday, February 04, 2014 4:01 PM
    Tuesday, February 04, 2014 3:10 AM
  • The CompletionCondition property on the Parallel activity can help here. If you specify "true" as the value of the CompletionCondition property, than as soon as ONE of the branches in the Parallel activity completes execution, the other branches are canceled.

    If you can arrange for each branch of the Parallel to NOT go idle after their respective Receive activity, the CompletionCondition of true should do the trick. To be safe, maybe you could define a Variable<bool> that is checked immediately after each Receive and if it is false, set it to true. If it is true, then you know that two Receive activities were able to able to execute and you don't want to do any more processing with that "second" Receive, so reply with an error.

    Remember that only one thread will execute within a workflow instance at a time. So a second message coming in will queue until the workflow goes idle from the first message.

    Friday, February 14, 2014 10:23 PM
    Moderator
  • Thanks, Jim. I had actually implemented a test case that basically does what you suggest (with a Parallel having a completion condition). My test scenario is a typical "four-eyes" approval process for a business process. There is a "Start" operation (CanCreateInstance) representing some kind of request for approval, followed by a Parallel "Approve" or "Reject" choice. The intent is that only one of these operations can execute. My test simulates the Approve being called by one client, and taking around 10 seconds to complete. During this time, another client calls Reject. I'm hoping (ideally) for the Approve to always succeed, and the Reject to always fail with an exception indicating that it is "out of order". I run the test under a range of permutations of Pick vs Parallel, Delay vs Sleep vs Spin for the 10 second "processing time", etc.

    This works as expected as long as the Approve does NOT idle (ie Delay in my test suite).

    If the Approve idles then obviously the Reject can now be executed. This is bad, so I also implemented a "gate" on each branch such that if the Approve happened to idle, Reject could still be called but would not actually execute any activities, instead throwing an exception.

    This solution is actually (infuriatingly) close to what we need. Unfortunately, while the Approve runs to completion (including the Delay period), and Reject does not run at all (thanks to the gate), APPROVE also appears to throw with a "REJECT unavailable", and actually ends up suspending the workflow (presumably because Reject's exception is now regarded as unhandled).

    I'm sure we could live without causing a Delay in one of the parallel branches when using this pattern, but it's causing quite a bit of head-scratching and trepidation about our approach to implementing workflows when we can't quite figure out how the engine is going to behave, or at least it seems to behave differently depending on what is placed between a Receive and a SendReply.

    I can send you my self-contained test solution if it would help to clarify what we're going for - just let me know how to get it to you.



    • Edited by wbradney Saturday, February 15, 2014 2:12 AM
    Saturday, February 15, 2014 2:01 AM