locked
The Best Way to Handle Poison Message RRS feed

  • Question

  • Hi,

    I'm going to use service broker in a very sensitive situation but poison messages make me worried. The situation is as follows:
    There are around 30 applications in our company and each application uses a separate DB server. The DB schema of each application is different. We need a part of each application data on a central server, so I'm going to audit the required tables of each DB server from the central server and make a same table from it.
    Each application is used by around 1000 users simultaneously.
    The thing, I'm worried about is poison messages. Since when a poison message is created, the queue is disabled and the rest of tables will not be audited till this message is manually removed, I'm looking for the best way to handle them in such situation and always having correct data.
    Besides, could you provide me of the possible causes of poison messages for my situation?

    Any idea, comment, etc is appreciated
    Sunday, September 7, 2008 6:50 AM

All replies

  • I just want to highlight a couple of points here:

    • First when a queue is disabled audit doesn't stop. The audit continues and audit messages are being held in the sender's transmission_queue. Once the queue is re-enabled the processing of audit resumes without loss. After all this is the whole purpose of asynchronous, queued based processing. So the real question then is how long a delay can you afford between the audited event happening (ie. message is sent) and the processing of the audit record (ie. message is processed).
    • Second is that when a poison message is 'handled', then it really is not a poison message anymore. I believe that you should focus on proper error handling first, specially in your activated procedures. I have touched this subject in my blog, see http://rusanu.com/2007/10/31/error-handling-in-service-broker-procedures/ and http://rusanu.com/2008/08/13/error-handling-and-activation/.

    Whatever can happen even after you do all the proper error handling and take all cautions on processing a message to succeed yet you still fail it will be a true poison message. I wouldn't recommend automatic removal of a poison message. First because I believe you should analyze the poison message in order to take proper measures in your ordinary handling of messages to avoid the problem from now on. Second because it can very well be that the message is not 'poison' in itself, but some other factors are making it 'poisonous', eg. data purity issues in your application table are causing a primary key duplicate violation when this particular message is processed.
    So I always recommend that poison message not to be handled, but instead to be detected. Set up a notification on the QUEUE_DISABLED event and send a mail to the administrator when this happens. The SSB team has a blog entry describing how to do this: http://blogs.msdn.com/sql_service_broker/archive/2008/06/30/poison-message-handling.aspx

    Some other comments:
    1. SSB reacts poorly to the situations when it has many (hundreds of thousands) of conversations active all trying to send messages (eg. if the target is disabled or down for maitenance and the application servers are accumulating messages over a few hours). Try too keep the number of conversations low.
    2. Monitor transmission queues on your application servers. Set up an automated infrastructure to do this. I've seen very fancy deployments ignoring this and not realising that IPSec was blocking traffic on one of the servers for a whole week, while the transmission_qeueu accumulated few million messages.
    3. You're gonna probably have to implement a message resending method in order to handle various conversation errors. This is not an easy trick to pull off, see http://rusanu.com/2007/12/03/resending-messages/ for some starting ideas.


    Sunday, September 7, 2008 1:03 PM
  • Thank you for the detailed and complete reply 
    I'll go through it and will come back to you in short
    Tuesday, September 9, 2008 5:50 AM