locked
WF4 Workflow: Durable Delay Not resuming after hour delay RRS feed

  • Question

  • Running durable delay service and seeing that workflow doesn't resume if delay is set to an hour.

    Seems to work on for short delays 5-10 min.

    If I fire off another workflow either right before or after delay timer is set to expire - earlier workflows resume and get processed.

    Is that a config issue of some sort or bug?

    According to documentation:

    Activation of Workflow Instances with Expired Timers

    The WMS automatically activates durable workflow instances when the durable timers associated with them expire. In other words, the WMS loads a workflow service instance into memory when the timer associated with the instance expires. The activated instance executes the activity (for example, the Delay activity) that caused the workflow instance to unload and continues with the execution.

    The WMS polls the databases specified in the WorkflowManagementService.exe.config file for expiring timers at regular intervals. If the timer is about to expire within the next polling interval, the WMS instructs a workflow service host to load the workflow instance into memory. To do so, the WMS adds a Run command to the command queue for that workflow instance. The WMS retrieves this command as part of the regular command execution cycle and runs the workflow instance.


    Friday, May 25, 2012 6:53 PM

All replies

  • You could check the tracking, tracing events to give more clues

    http://msdn.microsoft.com/en-us/library/ee677384(v=azure.10)

    Monday, May 28, 2012 11:28 AM
  • How are you hosting the workflow? Is it in AppFabric?
    Tim
    Tuesday, May 29, 2012 7:27 AM
  • yes - its app fabric - Auto-Start is enabled and app pool is set to always running.

    So far less than one hour delay seems to work and on longer delay (1 hr or greater) workflow doesn't resume on its own.

    I've even took Samples\WF\Basic\Services\DurableDelay and played around with delay 5 min, 15, 30, 45, 1 hr - same behavior.

    In production we'll be using delay that will vary from hours to days and we need to be certain that workflows resume when they suppose to.

    • Edited by tedka Friday, June 1, 2012 2:37 PM
    Friday, June 1, 2012 2:25 PM
  • If WMS is not monitoring the instance store, then the workflow host is not restarted once the workflow host is unloaded or the app domain has been recycled. Can you make sure that the WMS configuration includes the instance store you are using to persist your workflow instances?

    Thanks, Ruppert

    Wednesday, June 6, 2012 6:41 PM
  • We're using property promotion as well so here is service behavior configuration:

      <behaviors>
       <serviceBehaviors>
        <behavior name="TempAccessWorkflowBehavior">
         <sqlWorkflowInstanceStorePromotion connectionStringName="WorkflowInstanceStoreConnString" instanceEncodingOption="None" instanceCompletionAction="DeleteAll" instanceLockedExceptionAction="BasicRetry" hostLockRenewalPeriod="00:00:30" runnableInstancesDetectionPeriod="00:00:02">
          <promotionSets>
           <promotionSet name="TempAccessService">
            <promotedValue propertyName="UserName" />
            <promotedValue propertyName="DateBegin" />
           </promotionSet>
          </promotionSets>
         </sqlWorkflowInstanceStorePromotion>
               <workflowUnhandledException action="AbandonAndSuspend" />
         <workflowIdle timeToPersist="00:00:00" timeToUnload="10675199.02:48:05.4775807" />
         <!-- To avoid disclosing metadata information, set the value below to false and remove the metadata endpoint above before deployment -->
         <serviceMetadata httpGetEnabled="true" />
         <!-- To receive exception details in faults for debugging purposes, set the value below to true.  Set to false before deployment to avoid disclosing exception information -->
         <serviceDebug includeExceptionDetailInFaults="true" />
         <workflowInstanceManagement authorizedWindowsGroup="Domain\AS_Administrators" />
        </behavior>
       </serviceBehaviors>
      </behaviors>

    Thursday, June 7, 2012 3:01 AM
  • You are missing a <sqlWorkflowInstanceStore> service behavior, which takes all the attributes that now apprear in your <sqlWorkflowInstanceStorePromotion> service behavior in line 4. See http://social.msdn.microsoft.com/Forums/en-US/wfprerelease/thread/ce4d3869-fac5-4292-99c0-c438baabaf0d for and example of the config.

    Thanks, Ruppert

    • Proposed as answer by Tim Lovell-Smith Sunday, June 10, 2012 11:00 PM
    • Unproposed as answer by tedka Tuesday, June 12, 2012 2:20 AM
    Thursday, June 7, 2012 5:20 PM
  • <sqlWorkflowInstanceStore> service behavior doesn't matter in this case - it's already registered in global web.config and its same behavior
    either with or without it. 

    I believe its something related to permissions for workflowInstanceManagement and persistance db - doing testing on it right now.

    Tuesday, June 12, 2012 2:30 AM
  • The next thing to check is whether the Workflow Management Service (WMS) is running. It is a windows service that is responsible for activating the service host if a durable timer is about to expire. (See http://msdn.microsoft.com/en-us/library/ff383397(v=azure.10).aspx.) If it is not, please start the Workflow Management Service. It needs to run under an account that is part of the AS_Administrators security group. If the service is running, please check the event log for any error events that indicate that the WMS does not have rights to access your instance store database.

    Thanks, Ruppert

    • Proposed as answer by Tim Lovell-Smith Friday, June 15, 2012 9:30 PM
    • Unproposed as answer by tedka Friday, June 15, 2012 9:51 PM
    Tuesday, June 12, 2012 4:27 AM
  • Just curious if you were able to resolve this issue. I ran into the same issue yesterday and still puzzled why it only works for delays less than an hour.

    Thanks, Dinesh

     
    Sunday, August 19, 2012 8:48 AM
  • I got my issue resolved, but since I've tried variety of things it's unclear which one was the issue.

    And overall troubleshooting this wasn't trivial, but I’m convinced that permissions config played a role.

    If I had to do it over I would just re-provision everything from scratch, instead of spending days to troubleshoot.

    Monday, August 27, 2012 6:08 PM