none
Closed conversation endpoints not being purged RRS feed

  • Question

  • We have been having a problem with service broker for quite a while now and searching on these forums and more generally on the web has not yielded any kind of answer...

    Our application utilises service broker within a single database (there is no cross database, cross instance or cross server communication).  There are approximately 12 queues which are used to varying degrees.  A few of the queues have a throughput rate in the order of up to 100s per minute at peak periods.

    In some scenarios we are able to make use of persistent conversations but the majority of messages are sent on their own conversation or in small batches on their own conversation.

    Some time ago we found that the database was growing in size more than expected due to a build up of data in the service broker meta data internal tables, exposed via the sys.conversation_endpoints system view.

    We identified a problem in the application that was preventing some conversations from being closed and have now fixed this.

    However, we are still experiencing a build up in the service broker tables and sys.conversation_endpoints shows this is now due to a very large number of conversations in the "CLOSED" state.

    I know that conversations are kept around to prevent replay attacks but thought they were supposed to be cleared after about 30 minutes.  This is certainly not happening as we currently have conversations that were opened on the 18th September, a full week ago.  We currently have about 350,000 closed conversations and this figure is increasing!

    We have tried restarting the SQL Server instance with no effect.

    I have been using a script to loop through all closed conversations and get rid of them using the "WITH CLEANUP" clause but I'm loath to create a scheduled task that does this in the background when service broker should be doing the job itself.

    Has anybody experience this problem and, even better, have any idea how to solve it?

    Daniel

     

    Monday, September 25, 2006 8:10 AM

Answers

  •  Daniel Renshaw wrote:

    I'm not sure this tells us much but I am suprised about the location of the DISCONNECTED_OUTBOUND messages as I thought our code typcically ended the conversation immediately after sending the message(s) without waiting for a response.  Is this because I traced the service broker activity events rather than the actuall T-SQL statements issuing the service broker commands ("END CONVERSATION" etc)?

    I hop I'll g et this right, I'm doing it from memory:

    The transition CONVERSING->DISCONNECTED_OUTBOUND is when the initiator issues END CONVERSATION.
    The transition CONVERSING->DISCONNECTED_INBOUND is when the EndDialog message sent by initiator arrives at the target
    The transition DISCONNECTED_OUTBOUND->CLOSED is when the acknowledgement for the initiators EndDialog message comes back to it.
    The transition DISCONNECTED_INBOUND->CLOSED is when the target issues END CONVERSATION in response to the EndDialog message received from initiator.

    The pattern shown here is one of the pattern that leaves the target endpoint stranded. Because it never sent any message to the initiator, it never received an acknowledgement. An ack from the initiator sent back to the target serves as positive proof that the initiator will not retry to send message 0.  W/o this proof the target cannot delete itself, it has to stay around indefinetly to prevent a replay of message 0, resulting in a stranded target endpoint. That's another reason why the fire-and-forget pattern is ... unrecommended.

    BTW, you should also add the EventSubclass column to the profiler view, it details what kind of conversation event we're seeing.

    HTH,
    ~ Remus

    Wednesday, September 27, 2006 4:21 PM
    Moderator

All replies

  • Hi Daniel!

    Sounds very interesting. Do you close the conversation on both ends of the dialog with an END CONVERSATION statement?
    Normally the conversations should go away from sys.conversation_endpoints within 30 minutes as you have wrote...

    Thanks

    Klaus Aschenbrenner
    www.sqljunkies.com/weblog/klaus.aschenbrenner
    www.csharp.at

    Monday, September 25, 2006 3:36 PM
  • Hi Daniel!

    Sounds very interesting. Do you close the conversation on both ends of the dialog with an END CONVERSATION statement?
    Normally the conversations should go away from sys.conversation_endpoints within 30 minutes as you have wrote...

    Thanks

    Klaus Aschenbrenner
    www.sqljunkies.com/weblog/klaus.aschenbrenner
    www.csharp.at

    Monday, September 25, 2006 3:38 PM
  • Yes, we should be closing at both ends.  However, I've not yet confirmed this 100%.  I was working on the assumption that if the endpoint was in the "closed" state, we must have closed it at both ends (which is what the documentation suggests).  Given that SB doesn't seem to be working as expected, perhaps the endpoint state is also suspect?  I'll profile the system and see if I can trace the end conversations.
    Tuesday, September 26, 2006 6:27 AM
  • The endpoint states meaning is described here: http://msdn2.microsoft.com/en-us/library/ms176082.aspx. CLOSED indeed means both endpoints had issued END CONVERSATION.

    What is you message exchange pattern? Who is issuing END CONVERSATION first, the initiator or the target? Is the target ever sending any replies back to the initiator?

    Thanks,
    ~ Remus

    Tuesday, September 26, 2006 5:22 PM
    Moderator
  • Thanks for considering this Remus.  I'm a little busy and haven't yet had a chance to look into this to keep comprehensive answers; maybe I can spend more time on this later today.

    We have a mixture of message exchange patterns.  Most are fire and forget (no response from target to initiator) and in these situations the initiator starts a conversation sends a message and ends its side of the conversation with no delay; the target then picks up the message at some point in the future, processes it and then ends the other side of the conversation.  We also have a few request/response conversations; generally we have seperate processes responsible for sending requests and processing responses; there are no pseudo-synchronous conversations.

    I may have some more detailed information once I've analysed a trace of service broker activity in this database.

    Wednesday, September 27, 2006 8:25 AM
  • Here are some examples of the service broker activity:

    546BC38E-B300-40BC-9A54-2C6F281E928D 2006-09-27 09:35:50.920 Initiator DISCONNECTED_OUTBOUND
    546BC38E-B300-40BC-9A54-2C6F281E928D 2006-09-27 09:35:50.920 Target DISCONNECTED_INBOUND
    546BC38E-B300-40BC-9A54-2C6F281E928D 2006-09-27 09:35:50.920 Initiator CLOSED

    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.083 Initiator STARTED_OUTBOUND
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.100 Initiator CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.100 Target STARTED_INBOUND
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.100 Target CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.113 Initiator CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.113 Target CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.113 Initiator CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.113 Target CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.130 Initiator CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.130 Target CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.130 Initiator CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.130 Target CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.130 Initiator CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.147 Target CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.147 Initiator CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.147 Target CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.147 Initiator CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.147 Target CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.160 Initiator CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.160 Target CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.177 Initiator CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.177 Target CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.177 Initiator CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.177 Target CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.193 Initiator CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.193 Target CONVERSING
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.207 Initiator DISCONNECTED_OUTBOUND
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.207 Target DISCONNECTED_INBOUND
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:48:22.207 Initiator CLOSED
    221E2875-FEFD-41B2-B6EF-2C75F5885E32 2006-09-27 09:58:38.330 Target CLOSED

    C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:38:12.977 Initiator STARTED_OUTBOUND
    C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:38:12.993 Initiator CONVERSING
    C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:38:12.993 Target STARTED_INBOUND
    C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:38:12.993 Target CONVERSING
    C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:38:13.180 Initiator CONVERSING
    C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:38:13.180 Target CONVERSING
    C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:54:17.837 Initiator DISCONNECTED_OUTBOUND
    C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:54:17.837 Target DISCONNECTED_INBOUND
    C1E9F3C6-B10C-4605-8E6A-2CB03D95EE2C 2006-09-27 09:54:17.853 Initiator CLOSED

    I'm not sure this tells us much but I am suprised about the location of the DISCONNECTED_OUTBOUND messages as I thought our code typcically ended the conversation immediately after sending the message(s) without waiting for a response.  Is this because I traced the service broker activity events rather than the actuall T-SQL statements issuing the service broker commands ("END CONVERSATION" etc)?

    Daniel

    Wednesday, September 27, 2006 10:23 AM
  • The pattern that leaves target endpoints stranded is the one where the target never sends any message (including EndDialog message). So try not to do fire and forget patterns, they have other faults too. If you don't want/can't change this, then you should set up a task to periodically clean them up (using ENd CONVERSATION ... WITH CLEANUP). The criteria is conversation endpoints that are CLOSED and last activity is more than 30 minutes in the past.

    HTH,
    ~ Remus

    Wednesday, September 27, 2006 4:08 PM
    Moderator
  •  Daniel Renshaw wrote:

    I'm not sure this tells us much but I am suprised about the location of the DISCONNECTED_OUTBOUND messages as I thought our code typcically ended the conversation immediately after sending the message(s) without waiting for a response.  Is this because I traced the service broker activity events rather than the actuall T-SQL statements issuing the service broker commands ("END CONVERSATION" etc)?

    I hop I'll g et this right, I'm doing it from memory:

    The transition CONVERSING->DISCONNECTED_OUTBOUND is when the initiator issues END CONVERSATION.
    The transition CONVERSING->DISCONNECTED_INBOUND is when the EndDialog message sent by initiator arrives at the target
    The transition DISCONNECTED_OUTBOUND->CLOSED is when the acknowledgement for the initiators EndDialog message comes back to it.
    The transition DISCONNECTED_INBOUND->CLOSED is when the target issues END CONVERSATION in response to the EndDialog message received from initiator.

    The pattern shown here is one of the pattern that leaves the target endpoint stranded. Because it never sent any message to the initiator, it never received an acknowledgement. An ack from the initiator sent back to the target serves as positive proof that the initiator will not retry to send message 0.  W/o this proof the target cannot delete itself, it has to stay around indefinetly to prevent a replay of message 0, resulting in a stranded target endpoint. That's another reason why the fire-and-forget pattern is ... unrecommended.

    BTW, you should also add the EventSubclass column to the profiler view, it details what kind of conversation event we're seeing.

    HTH,
    ~ Remus

    Wednesday, September 27, 2006 4:21 PM
    Moderator
  • Thanks Remus, I'll implement the background cleanup for now.  We may be revisting our core principles at some point in the future based on the lessons learnt over the past few months.  We've found that tables may have been a more appropriate solution than the many queues we've ended up with.

    For example, a message arrives and we put a record in a table as an audit/status information.  This continues through the system, each message/step typically generates further records being inserted/updated in regular tables so that we can keep track of what's going on.  This has caused tremendous bloating/duplication of data and massive transaction logs so we're now asking ourselves: why not put the message in that same table and forget the queues.

    When we come to reconsider all that, we'll ensure fire-and-forget is removed from any service broker elements that remain.

    Daniel

    Wednesday, September 27, 2006 4:33 PM
  • This saved me from pulling my hair out... I've added it to my blog.

    http://www.nootz.net/index.php/notification-services-conversation-cleanup.html

    Thanks,
    Garick
    Friday, October 13, 2006 3:52 PM
  • I have an application that runs on 3-4 computers per workgroup.

    There are 170 workgroups.

    I have been using SQL Dependencies for about 1 1/2 years in this software.

     

    I have had to write a cleanup routine that deletes all conversations when the business day has ended.

    I have seen as many as 750,000 records in the conversation table.  This was choking the system down to dragging.

    There would only be 5 to 12 conversations at any given time but, here is an example from 1 computer at the end of the day:

    Service Broker Cleanup
    Started: 7:15:52 PM
    There are 21 rows in 'CLOSED' state.
    There are 859 rows in 'DISCONNECTED_INBOUND' state.
    There are 7 rows in 'STARTED_OUTBOUND' state.

    I have some stores that have NO Disconnected_Inbound and some that have thousands.

    This is a single business day at a moderate store.

     

    Another really big issue with the conversations is, sometimes they just don't fire.

    I have had to start logging when the notification fires (in our app) because of this issue.

     

    I have read that SQL Server 2008 no longer supports  SQL Dependencies.

    This will cause a re-write on my part.  I was very excited when I could stop polling the database every 5 seconds in favor of this technology.

     

    Any thoughts on this?

     

     

    Saturday, July 12, 2008 1:03 PM