I'm struggling with a peculiar issue. I am hosting a workflow service in AppFabric with the IdleToPersist = 1 (second) and IdleToUnload = 2 (seconds) configuration. The workflow is using the Send and ReceiveReplyForSend pair to communicate with other services. When the workflow service sends a request to another service and waits for the response more than 2 seconds, it is unloaded and is never resumed even if the response arrives later.
I investigated the tracked event, and found the workflow instance is not persisted before it's unloaded. I understand there is a no-persist zone between the Send and the ReceiveReplyForSend, but the unload behavior doesn't make sense. How can a workflow instance be unloaded when it's never persisted?
I tried to put the Send and ReceiveReplyForSend pair into a TransactionScope because I know TransactionScope also creates a no-persist zone, but the issue is still there....
Is this a defect from Microsoft?
Does anyone know the cause of the issue and how to fix it?
Thanks for your answer in advance.
- Edited by Leo Yao Thursday, May 31, 2012 4:56 AM
I know the TCP interface better than the Wrokflow. since the underline transport method is TCP I suspect few things that can be looked at.
1) Try the DOS command Netstat - A. this will give a list of port status that the TCP connection is usng. Yo can check to see the status of the connection. I suspect te connection never completed.
2) Try in DOS doing a ping to make sure you have a connection between the two computers.
3) Try doing ping to "LocalHost". If local host returns the loopback address 127.0.0.1. the timeout of 2 seconds may occur if the TCP connection used th eloopback IP address instead of a real address like 192.X.X.X. TCP will refuse a connection if one of the endpoints is the loopback address. IF any of you config files uses LocalHost and the Localhost is configured to 127.0.0.1 then this is what is causing the problem. Some versions of windows uses Local Host as the host name and others uses LocalHost as the loopback 127.0.0.1.
I tested as you suggested and I am sure that the connection is established. Also I am sure the workflow in the counterpart service is completed, because I can see the completed status of that workflow in the AppFabric Dashboard. I think that means it indeed sent a response to the problemtical workflow service.
I am also attaching the two workflow service diagram below and the last part of the information retrieved from the ASWfEvents view. Hope these will be helpful.
The last part of the information retrieved from the ASWfEvents view: