About persistence, IEventActivity, Transactions and correlations
Hi,
I have a bounch of questions.
1. Persitance
The activity can be adorned with the [PersistOnClose] attribute. What happens in the following scenario:
- we have a sequential workflow with 2 activities: A1 followed by A2
- activity A1 is adored with the [PersistOnClose]
- A1 finishes executing and moves to Closed state
- the runtime persists the activity A1
- the computer crashes
- after the computer is restarted, the workflow hosts is started again. The persistence provider automatically (am I right?) loads active workflows and continues the execution at the activity A2. Right?How does the runtime know, which activity should be executed next after it deseralizes the workflow? What data is persisted by the persistence provider: all public and private properties of all activities. Anything else? Do WorkflowQueues get persisted? What about Correlation tokens etc?
Will there be a Microsoft provided OracleWorkflowPersistanceService?
2. Custom EventActivity
I need the activity with the following behavior:
- When the activity starts executing, it writes a record to the database. The record contains workflowInstanceId the activity Id and some other data. (A Guid could be used for activity Id, but I would rather use a runtime provided activity instance identifier. Does it exist?)
- an external applications processes the record in the database and needs to notify the workflow when the record is processed. The communication between the external application and the workflow host is already implemented (let’s say that we have a custom web service exposed by the workflow host and we do not want to use WebServiceInputActivity).
- the host need to notify the activity when the external application has processed the record. I would like to use only own event (OnRecordProcessed) on my service’s data exchange interface
- there can be many instances of this activity in this workflow (either static or dynamic generated by the ReplicatorActivity)
Question:
What is the easiest way to implement this (i.e. deliver the same event to the right instance of activity)? Do I need to use correlations (see questions bellow). Should I implement my own event driven activity and have each insance of activity subscribe to its own queue?3. Transactions
Can the following steps participate in the same database transaction:
- external application notifies workflow host, that the record has been processed
- event is delivered to my custom activity
- BEGIN TRANSACTION
- My custom activity A1 updates the database and finishes executing.
- workflow engine schedules executions of A2, that follows A1
- A2 inserts a new record in the database and returns from the Execute method with status Executing
- Workflow goes idle (waiting for the external event)
- Runtime persist the workflow
- END TRANSCATION
4. CorrelatedService sample
I have already looked at the CorelatedServiceSample and I have the following questions:
Could I implement the same sample without the CreateTaskActivity. The workflow would look like this:
- Parallel1
- Sequence1
- Taskcompleted1
- Parallel2
- Sequence2
- Taskcompleted2
(I have removed the CreateTaskActivity from the original samole)
If I understand the sample correctly, I would need to apply the [CorrelationInitializer] to the TaskCompleted event. Right? Sample:
[ExternalDataExchange]
[CorrelationParameter("taskId")]
public interface ITaskService
{
[CorrelationAlias("taskId", "e.Id")]
[CorrelationInitializer]
event EventHandler<TaskEventArgs> TaskCompleted; }
}Who would initializes the correlation tokens (c1,c2) in this scenario?
What is the purpose of the CorrelatioToken? Is it just a container for correlation values and used to link together CallExternalMethodActivity and HandleExternalEventActivity?.
What would happen if I removed the CorrelationXXXXAtrbibutes from the original ITaskService interface. Which of the TaskCompletedActivity would get the TaskCompleted events. My guess: both TaskCompleted activities would lsiten on the same WorkFlowQueue. Each of them would got exactly one message. However, there is no guarantee, that TaskCompleted1 would get the event with correlationId=001 and TaskCompleted2 would get the event with correlationId=002. Right?
I am looking forward for the answers ;-)
Matej
- When the activity starts executing, it writes a record to the database. The record contains workflowInstanceId the activity Id and some other data. (A Guid could be used for activity Id, but I would rather use a runtime provided activity instance identifier. Does it exist?)
Answers
I'm going to answer your questions one by one in separate replies. So, for the first one:
Which activity executes next and how does the runtime know?
In your scenario the workflow will be loaded into memory and execution will continue with Activity2. Our persistence service does a binary serialization of the workflow instance which includes a scheduler. All actions taken in the workflow (execution of an activity, closing of an activity, raising of the QueueItemAvailable event) are run through the scheduler. Therefore, when we persist we also persist the list of work which is "pending".What data is persisted by the persistence provider?
Our default ActivitySurrogate which is used for serialization persists all the private and public members of the type that are not explicitly marked NonSerialized. In addition, the scheduler associated with the instance, any queues associated with the instance, etc. CorrelationTokens are manifest as private state of the activities which reference them and therefore would be saved. In short, when you load a persisted workflow it should be in a state which is no different than if it hadn’t been unloaded in the first place.Will there be a Microsoft provided OracleWorkflowPersistenceService?
Windows Workflow Foundation (WF) V1 will not include such a service. You can check the community site to see if anyone has uploaded a custom solution.More answers coming later.
To answer (2):
We do not have a built in unique identifier for an activity instance. Therefore, if you are in a WhileActivity and some activity “foo” executes three times we do not generate a Guid on “foo” which you can just borrow. We do, however, generate a Guid for the direct child of the WhileActivity (the root of the new context) and you can access that value by walking up the Parent tree until you find an activity with a non-Empty return value for the Activity.ActivityContextGuidProperty.
The easiest way to implement what you want is probably through our ExternalDataExchangeService. Your scenario is exactly the reason we created this service and our correlation will allow you to do things “nicely”. For example, let’s say your activity is a composite activity that looks like:
CustomDoSomethingActivity
CallExternalMethodActivity (Method = “WriteToDatabase”)
HandleExternalEventActivity (Event = “WorkCompleted”)
WriteToDatabase could take a Guid (the make up correlation token) and the data that needs to go into the database and it would perform the write. When the work is done your host would simply raise WorkCompleted passing in the Guid (the correlation token) along with any “result” that you wanted to get back into the workflow.
This is nice because (1) it leverages processes that we have in place which have been thoroughly tested, (2) you don’t have to write the correlation logic, and (3) you will simply work with your CustomDoSomethingActivity without worrying about these details once it’s been constructed.
You could very well write this using a custom IEventActivity and utilizing queues. The thing you would gain would be the ability to model this as a single activity that does both jobs as opposed to a composite activity which starts with a CallExternalMethodActivity. Note that the above CustomDoSomethingActivity would NOT implement IEventActivity and therefore could not be used to start an EventDriven and, again, could not be used in StateMachineWorkflow.
For topic (3):
The answer is yes and no. This depends on the hosting environment. Essentially, if your hosting environment arbitrarily Unloads workflow instances than you cannot guarantee that ALL of those things happen in the same transaction. Picture ASP.NET recycling the process causing the WorkflowRuntime to Unload all workflow instances and you’ll see a host with “random” unloading.
You can, however, guarantee that the workflow and your database remain consistent. You can guarantee that the record is only removed (A1’s job) if the workflow is persisted PAST the point of receiving the message. You can also guarantee that the record is only written (A2’s job) if the workflow successfully commits. We are just saying that there is the slightest chance that in some hosts A1 and A2 might not do their work in the same transaction. If you really must get this to work, then we can talk about a complete redesign of your approach which will do the right thing.
So, to do what I said in the last paragraph (Guarantee consistency between A1 and the workflow as well as A2 and the workflow), you should use batching. Batching is a mechanism by which we allow you to put a callback and a work item into a list. When we are about to persist the workflow, we will create a transaction (through the WorkflowCommitWorkBatchService) and call your delegate with your work item. This essentially means that we are allowing you to partake in our persistence transaction. So, your scenario would go a little something like this:
1) Deliver the event to the workflow along with an implementation of IPendingWork and a work item. For you this is most likely an implementation which will delete a row from a database and a work item which is the primary key for the row. Note that for ExternalDataExchangeService the IPendingWork and object work item can be specified in the constructor for ExternalDataEventArgs and they are the last two parameters for WorkflowInstance.EnqueueItem.
2) Have A2 add its work to the work batch. Note that A1’s whole purpose has been swallowed by the IPendingWork implementation. Alternately you could have A1’s handling of the inbound message add the IPendingWork to the batch through the static WorkflowEnvironment.WorkBatch.Add().
3) By setting UnloadOnIdle to true you will cause the workflow to persist when it is idle. This will cause the IPendingWork from the event delivery as well as the IPendingWork from A2’s execution to be called back with the persistence transaction.
Answer to (4):
You cannot remove the CreateTask call from that sample. The reason is that we do not support what is referred to as static correlation initialization. Just as you asked, who would initialize the correlation tokens?
The logic is that the outside world would not know about the correlation value if some data wasn’t passed from the workflow (let’s ignore inbound convoys) at some point. There are certainly scenarios where static initialization can be used successfully, but you’ll often find that there is a CallExternalMethodActivity lurking in there which is either being ignored or being replaced with some other logic. For example, adding something to a database from the activity instead of having it occur in the local service … the addition to the database was really the action which initializes the correlation.
The CorrelationToken object is exactly as you suspect. It is a container for the correlation values which is used to link together two or more activities into the same conversation.
If you removed the various correlation related attributes from that sample then you would get a runtime error in the worst case scenario. Our HandleExternalEventActivity implementation does NOT allow to activities to be listening for the same event at the same time (including correlation data). By removing correlation, you would end up with exactly that setup when the workflow went idle. We would post the event, notify BOTH activities, and the second activity would throw an exception because the message it was expecting wasn’t there (already picked up by the first guy). Note that this is not the behavior you expected.
I hope that helps. Let me know if there are any more questions.
All Replies
I'm going to answer your questions one by one in separate replies. So, for the first one:
Which activity executes next and how does the runtime know?
In your scenario the workflow will be loaded into memory and execution will continue with Activity2. Our persistence service does a binary serialization of the workflow instance which includes a scheduler. All actions taken in the workflow (execution of an activity, closing of an activity, raising of the QueueItemAvailable event) are run through the scheduler. Therefore, when we persist we also persist the list of work which is "pending".What data is persisted by the persistence provider?
Our default ActivitySurrogate which is used for serialization persists all the private and public members of the type that are not explicitly marked NonSerialized. In addition, the scheduler associated with the instance, any queues associated with the instance, etc. CorrelationTokens are manifest as private state of the activities which reference them and therefore would be saved. In short, when you load a persisted workflow it should be in a state which is no different than if it hadn’t been unloaded in the first place.Will there be a Microsoft provided OracleWorkflowPersistenceService?
Windows Workflow Foundation (WF) V1 will not include such a service. You can check the community site to see if anyone has uploaded a custom solution.More answers coming later.
Nate,
Thank you very much for the answers about the persistence. When you are talking about the scehduler you probably mean DefaultWorkflowSchedulerService and ManualWorkflowSchedulerService. Right?
BTW: I have tried to use the XML serializer and to inspet the serailized state. That failed, because the XML serializer does not support serialization of generic type (and as far as I know it also does not handle private member, so it is completelly useless in this scenario).
Could there be an versioning problems when using binary serialization in the enviroment where there are many long-runing wokflow instances and one has to upgrade the wokflow or activity by adding a new field? Is the version torrelant serialization used by default (see http://msdn2.microsoft.com/en-us/library/ms229752.aspx).
I am looking forward for more answers and thanks again.
Matra
When talking about the scheduler I'm actually referring to an internal "queue" we keep of the workflow actions that are pending (execute method of an activity, QueueItemArrived notification, etc) and NOT the WorkflowSchedulerService implementations. This internal scheduler's Run method is actually the delegate which is passed to WorkflowSchedulerService.Schedule().
We do not support XML serialization of workflow instances.
I don’t think that version tolerant serialization would have the effect that you are looking for in our case. In order to not require that activities be marked Serializable we have created an internal ActivitySurrogate (ISerializationSurrogate implementation) which is serialized instead. I believe that surrogate is responsible for determining the type that is actually created during deserialization and therefore it is the code that would have to be version tolerant. I do not believe that it is.
If you absolutely must get version tolerant behavior then you could look into writing your own ISerializationSurrogate for activities.
More on the other original questions later. I just got back from TechEd so I’m a little pressed for time right now.
To answer (2):
We do not have a built in unique identifier for an activity instance. Therefore, if you are in a WhileActivity and some activity “foo” executes three times we do not generate a Guid on “foo” which you can just borrow. We do, however, generate a Guid for the direct child of the WhileActivity (the root of the new context) and you can access that value by walking up the Parent tree until you find an activity with a non-Empty return value for the Activity.ActivityContextGuidProperty.
The easiest way to implement what you want is probably through our ExternalDataExchangeService. Your scenario is exactly the reason we created this service and our correlation will allow you to do things “nicely”. For example, let’s say your activity is a composite activity that looks like:
CustomDoSomethingActivity
CallExternalMethodActivity (Method = “WriteToDatabase”)
HandleExternalEventActivity (Event = “WorkCompleted”)
WriteToDatabase could take a Guid (the make up correlation token) and the data that needs to go into the database and it would perform the write. When the work is done your host would simply raise WorkCompleted passing in the Guid (the correlation token) along with any “result” that you wanted to get back into the workflow.
This is nice because (1) it leverages processes that we have in place which have been thoroughly tested, (2) you don’t have to write the correlation logic, and (3) you will simply work with your CustomDoSomethingActivity without worrying about these details once it’s been constructed.
You could very well write this using a custom IEventActivity and utilizing queues. The thing you would gain would be the ability to model this as a single activity that does both jobs as opposed to a composite activity which starts with a CallExternalMethodActivity. Note that the above CustomDoSomethingActivity would NOT implement IEventActivity and therefore could not be used to start an EventDriven and, again, could not be used in StateMachineWorkflow.
For topic (3):
The answer is yes and no. This depends on the hosting environment. Essentially, if your hosting environment arbitrarily Unloads workflow instances than you cannot guarantee that ALL of those things happen in the same transaction. Picture ASP.NET recycling the process causing the WorkflowRuntime to Unload all workflow instances and you’ll see a host with “random” unloading.
You can, however, guarantee that the workflow and your database remain consistent. You can guarantee that the record is only removed (A1’s job) if the workflow is persisted PAST the point of receiving the message. You can also guarantee that the record is only written (A2’s job) if the workflow successfully commits. We are just saying that there is the slightest chance that in some hosts A1 and A2 might not do their work in the same transaction. If you really must get this to work, then we can talk about a complete redesign of your approach which will do the right thing.
So, to do what I said in the last paragraph (Guarantee consistency between A1 and the workflow as well as A2 and the workflow), you should use batching. Batching is a mechanism by which we allow you to put a callback and a work item into a list. When we are about to persist the workflow, we will create a transaction (through the WorkflowCommitWorkBatchService) and call your delegate with your work item. This essentially means that we are allowing you to partake in our persistence transaction. So, your scenario would go a little something like this:
1) Deliver the event to the workflow along with an implementation of IPendingWork and a work item. For you this is most likely an implementation which will delete a row from a database and a work item which is the primary key for the row. Note that for ExternalDataExchangeService the IPendingWork and object work item can be specified in the constructor for ExternalDataEventArgs and they are the last two parameters for WorkflowInstance.EnqueueItem.
2) Have A2 add its work to the work batch. Note that A1’s whole purpose has been swallowed by the IPendingWork implementation. Alternately you could have A1’s handling of the inbound message add the IPendingWork to the batch through the static WorkflowEnvironment.WorkBatch.Add().
3) By setting UnloadOnIdle to true you will cause the workflow to persist when it is idle. This will cause the IPendingWork from the event delivery as well as the IPendingWork from A2’s execution to be called back with the persistence transaction.
Answer to (4):
You cannot remove the CreateTask call from that sample. The reason is that we do not support what is referred to as static correlation initialization. Just as you asked, who would initialize the correlation tokens?
The logic is that the outside world would not know about the correlation value if some data wasn’t passed from the workflow (let’s ignore inbound convoys) at some point. There are certainly scenarios where static initialization can be used successfully, but you’ll often find that there is a CallExternalMethodActivity lurking in there which is either being ignored or being replaced with some other logic. For example, adding something to a database from the activity instead of having it occur in the local service … the addition to the database was really the action which initializes the correlation.
The CorrelationToken object is exactly as you suspect. It is a container for the correlation values which is used to link together two or more activities into the same conversation.
If you removed the various correlation related attributes from that sample then you would get a runtime error in the worst case scenario. Our HandleExternalEventActivity implementation does NOT allow to activities to be listening for the same event at the same time (including correlation data). By removing correlation, you would end up with exactly that setup when the workflow went idle. We would post the event, notify BOTH activities, and the second activity would throw an exception because the message it was expecting wasn’t there (already picked up by the first guy). Note that this is not the behavior you expected.
I hope that helps. Let me know if there are any more questions.
Thank you very muech for the detailed answer. It was very helpfull. I'lldo some more research and after that I might have some more questions :-) I have already be looking at the IPendingWork and transactions, but was unable to get the SDK sample to work. See http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=468066&SiteID=1 for details.
Matra
- Hi,
Regarding correlations, I'm still struggling with using a correlation parameter that is the result of the call.
For instance, I'd like to use a DB generated ID as a correlation Parameter, but that Id is not yet known when calling the initializer.
For instance, the initializer is a createTask(Task t), the CorrelationParameter is "t.id", but the t.id will be only set as a result of the call to createTask !
It didn't seem to work the naive way. Then I tried to paly around with ref and out parameters but without any luck.
Is there a way to do that, or is the workaround to use pre-generated Id's ?
Thanks
Blaise Blaise, you are going to have to use pre-generated IDs. The correlation value cannot be set AFTER the intializer ... the message that is doing the initializing must contain the correct correlation value because we intercept that message and extract the value before handing it to the runtime (in the case of HandleExternalEventActivity initializers) or handing it to the local service (in the case of CallExternalMethodActivity intializers).
Nate


