after restore the service broker process
-
Monday, November 19, 2012 1:10 PM
I have an (as yet hypothetical) order entry system. Order data is entered into DB1 and processed in DB2. SSB is used to send the order (order header and detail) via a conversation group. DB1 goes down due to a HDD failure, and is restored from backup. What options do I have for re-processing order dialogs/conversations that were ended just before DB1 went down but do not exist in the restored version of the DB? Or lets say DB2 went down and was restored instead. Order dialogs that are not yet ended in DB2 may have already been ended before it went down, but now do not exist in DB2 after the restore.
Is there a way to restart the conversation from a certain point in time to recover order data that is out of sync due to the failure/restore? A similar scenario might be when restoring production dumps into a test environment; if the dumps are slightly out of sync I'd need to reprocess any messages that were not in sync then the dumps were created.
All Replies
-
Tuesday, November 20, 2012 8:02 AMModerator
Hi Jason,
When the two endpoints of a conversation become out of sync (e.g. because of a back in time restore), the most appropiate action is to end with error the given conversation. The problem described by you is inherent in any distributed application and there is no silver bullet that solves all the issues.
SSB does ofer some tools to help you in this situation:
- ALTER DATABASE ... SET ERROR_BROKER_CONVERSATIONS. Running this statement will error all conversations in a database. An error message is sent to the conversation's peer endpoint (e.g. when erroring DB2, each conversation will send en error message to it's DB1 peer)
- ALTER DATABASE ... SET NEW_BROKER. Running this statement will wipe out all conversation in a database. No error message is sent to the peer.
- END CONVERSATION ... WITH CLEANUP. Individual conversations can be cleaned up with this statement. No erorr message is sent to the peer.
- dialog lifetime. Specifying a limited lifetime allows conversation to have a defined error semantic: they either succeed within the specified lifetime, or they will be errored by the system.
- Queue retention. When retention is enabled on a queue, each conversation's sent and received messages are kept in the the queue until the conversation is ended. The intent of retention is to allow an application to use these messages to perform compensation operations that undo the conversation effects, if the conversation is errored.The best thing you can do is to write the application in a manner that is prepared to deal with error messages. This way you can error the out-of-sync conversations and let the application deal with it like it would deal with any other error message. The tricky problem is when to end a conversation. Because after a conversation endpoint was ended, the conversation cannot be errored anymore and will not receive any error message from the peer endpoint (since is already ended!). The proper moment of ending a conversation depends entirely on the business semantics of the conversation. The application should end a conversation only when is really no longer interested in the conversation. That is, if the business transaction (e.g. order processing) has reached a state when even if DB2 is lost and restored back in time DB1 is no longer interested in it, then DB1 can end it's conversation endpoint.
Iric Wen
TechNet Community Support
- Proposed As Answer by Sanil Mhatre Tuesday, November 20, 2012 1:25 PM
- Marked As Answer by Iric WenModerator Tuesday, November 27, 2012 9:18 AM
-
Tuesday, November 20, 2012 8:26 AM
Being able to restore your order(s) depends on many factors. If you employ SSB most of these are your own design decisions: how much data do you keep available redundantly on both databases, how much information do you specify in the messages, what does the processing in DB2 do (to/with) the order, what information do you return to DB1 when the processing is finished, etc.
If you need your order entry/processing system to be as robust as you present it here, SSB can help you achieve it but you'll still have to design and program it yourself. All SSB guarantees is that the messages you hand over to it, will either be delivered at the destination or, if a message can not be delivered (in time), you'll be notified by an error message. Service broker takes care of all queuing, routing and the transporting itself. You have to employ these tools for your purposes. That said, if you properly record each order's processing steps history, you should be able to replay the actions in case of calamities. You can however not restart the same 'running' conversation if either of the databases was restored to a point where the other end doesn't know about the conversation. The end that does know about the conversation will receive an error message from service broker because the conversation is now 'broken'. It is up to your solution how to respond to this error. One possible response can be to start a new conversation and replay whatever is needed to make sure the both databases are in sync again with the order.
You will only get an error message from SSB for orders that were still 'in progress', i.e. for which a conversation was still open. In other words, the above approach doesn't guarantee that the orders are in sync between the both databases that were already completed before the calamity. It is up to your solution to reconcile any orders that were completed before the calamity but 'undone' at either end by the restore. My suggestion is to make your entry system a 'client' of the processing system, making that the 'target'. The client initiates a conversation between the two systems, by sending an initialization message containing some meta data data on the status of the order(s) in its database (f.e. latest order number?). The processing database will check that status against it's own status and if the orders in both systems are not in sync, the target will respond by requesting the client to reconcile the orders between the systems before any further processing can take place. Only when the orders are in sync (again) the target will send a message saying the client can send new orders. This way you can make sure the systems will automatically get back in sync after a disaster.
SQL expert for JF Hillebrand IT BV - The Netherlands.


