none
Service Broker Activation Deadlocking RRS feed

  • Question

  • I have implemented a (slightly) modified version of conversation recycling using conversation timers and stored procedure activation from http://rusanu.com/2007/05/03/recycling-conversations/ . However it appears that, occasionally, deadlocks  occur between the send and activated procedures on the conversation group/table. The main modification is that instead of having a column to represent the SPID in the table I am using an IdentifierType and Identifier value to identify the conversation. However I am only using the defaults (@@SPID) so I don't think that should matter in this case.

    For the send side I have:

    CREATE PROCEDURE [dbo].[usp_SendMessage]
    	(
    	@endpointCode VARCHAR(255) = NULL,
    	@endpointGroup VARCHAR(255) = NULL,
    	@xmlPayload XML=NULL,
    	@binaryPayload VARBINARY(MAX)=NULL,
    	@varcharPayload VARCHAR(MAX)=NULL,
    	@identifier VARCHAR(50) = @@SPID,
    	@identifierType VARCHAR(50) = '@@SPID'
    	)
    AS BEGIN
    
    	SET NOCOUNT ON
    
    	DECLARE @fromService SYSNAME,
    		@toService SYSNAME,
    		@onContract SYSNAME,
    		@messageType SYSNAME,
    		@conversationTimeout INT
    
    	SELECT @fromService = FromService
    		, @toService = ToService
    		, @onContract = OnContract
    		, @messageType = MessageType
    		, @conversationTimeout = ConversationTimeout
    	FROM dbo.ServiceBrokerEndpointConfig
    		WHERE GroupCode = @endpointGroup
    
    	IF @fromService IS NULL OR @toService IS NULL OR @onContract IS NULL OR @messageType IS NULL OR @conversationTimeout IS NULL
    	BEGIN
    		RAISERROR (
    			N'Failed to get endpoint config for GroupCode ''%s''.'
    			, 16, 1, @endpointGroup) WITH LOG;
    		RETURN;
    	END
    
    	DECLARE @SBDialog UNIQUEIDENTIFIER
    	DECLARE @Message XML
    	DECLARE @counter INT
    	DECLARE @error INT
    	DECLARE @handle UNIQUEIDENTIFIER;
    
    	DECLARE @NotNullCount INT = ((CASE WHEN @xmlPayload IS NULL THEN 0 ELSE 1 END)
    									+ (CASE WHEN @binaryPayload IS NULL THEN 0 ELSE 1 END)
    									+ (CASE WHEN @varcharPayload IS NULL THEN 0 ELSE 1 END))
    	IF @NotNullCount > 1
    	BEGIN
    		RAISERROR (
    				N'Failed to SEND because %i payload fields are filled in when no more than 1 is expected'
    				, 16, 1, @NotNullCount) WITH LOG;
    		RETURN;
    	END
    
    	SET @counter = 1
    
    	WHILE (1=1)
    	BEGIN
    
    		SET @handle = NULL
    
    		-- Seek an eligible conversation in [ServiceBrokerConversations]
    		-- We will hold an UPDLOCK on the composite primary key 
    		SELECT @handle = Handle
    			FROM [ServiceBrokerConversations] WITH (UPDLOCK)
    			WHERE Identifier = @identifier
    					AND IdentifierType = @identifierType
    					AND FromService = @fromService
    					AND ToService = @toService
    					AND OnContract = @onContract;
    
    		IF @handle IS NULL
    		BEGIN
    
    			-- Need to start a new conversation for the current @Id
    			BEGIN DIALOG CONVERSATION @handle
    				FROM SERVICE @fromService
    				TO SERVICE @toService
    				ON CONTRACT @onContract
    				WITH ENCRYPTION = OFF;
    
    			-- Then the sender must listen on the
    			-- send queue for the http://schemas.microsoft.com/SQL/ServiceBroker/DialogTimer message type and
    			-- cleanup appropriately.
    			IF @conversationTimeout IS NOT NULL
    			BEGIN
    				BEGIN CONVERSATION TIMER (@handle) TIMEOUT = @conversationTimeout;
    			END
    
    			INSERT INTO [ServiceBrokerConversations]
    				(Identifier, IdentifierType, FromService, ToService, OnContract, Handle)
    				VALUES
    				(@identifier, @identifierType, @fromService, @toService, @onContract, @handle);
    
    		END;
    
    		IF @xmlPayload IS NOT NULL
    		BEGIN
    			-- Attempt to SEND on the associated conversation
    			;SEND ON CONVERSATION @handle 
    				MESSAGE TYPE @messageType (@xmlPayload);
    		END ELSE IF @binaryPayload IS NOT NULL
    		BEGIN
    			;SEND ON CONVERSATION @handle 
    				MESSAGE TYPE @messageType (@binaryPayload);
    		END ELSE BEGIN
    			;SEND ON CONVERSATION @handle 
    				MESSAGE TYPE @messageType (@varcharPayload);
    		END
    
    		SELECT @error = @@ERROR;
    
    		IF @error = 0
    		BEGIN
    			-- Successful send, just exit the loop
    			BREAK;
    		END
    
    		SELECT @counter = @counter+1;
    
    		IF @counter > 10
    		BEGIN
    			-- We failed 10 times in a row, something must be broken
    			RAISERROR (
    				N'Failed to SEND on a conversation for more than 10 times. Error %i.'
    				, 16, 1, @error) WITH LOG;
    			BREAK;
    		END
    
    		-- Delete the associated conversation from the table and try again
    		DELETE FROM [ServiceBrokerConversations]
    			WHERE Handle = @handle;
    
    		SET @handle = NULL;
    
    	END
    
    END


    And for the activation on the initiator queue I have:

    CREATE PROCEDURE [dbo].[usp_InitiatorQueueHandler]
    AS
    BEGIN
    
    	SET NOCOUNT ON
    
    	DECLARE @handle UNIQUEIDENTIFIER;
    	DECLARE @messageTypeName SYSNAME;
    	DECLARE @messageBody VARBINARY(MAX);
    
    	WHILE (1=1)
    	BEGIN
    
    		BEGIN TRAN;
    
    		;WAITFOR (RECEIVE TOP(1)
    			@handle = conversation_handle,
    			@messageTypeName = message_type_name,
    			@messageBody = message_body
    		FROM [InitiatorQueue]), TIMEOUT 5000;
    
    		IF (@@ROWCOUNT = 0)
    		BEGIN
    			COMMIT TRAN;
    			BREAK;
    		END
    
    		-- Call the base stored procedure to handle ending the conversation
    		EXEC dbo.usp_BrokerHandleInitiator @handle, @messageTypeName, @messageBody
    
    		COMMIT TRAN;
    
    	END
    
    END
    
    GO
    
    ALTER QUEUE [InitiatorQueue]
    WITH ACTIVATION (
    	STATUS=ON,
    	PROCEDURE_NAME=dbo.usp_InitiatorQueueHandler,
        EXECUTE AS OWNER,
        MAX_QUEUE_READERS=10
    )
    
    GO
    
    CREATE PROCEDURE [dbo].[usp_BrokerHandleInitiator]
    	(
    	@handle UNIQUEIDENTIFIER,
    	@messageTypeName SYSNAME,
    	@messageBody VARBINARY(MAX)
    	)
    AS
    BEGIN
    
    	SET NOCOUNT ON
    
    	IF @handle IS NOT NULL
    	BEGIN
    
    		-- Delete the message from the [ServiceBrokerConversations] table
    		-- before sending the [EndOfStream] message. The order is
    		-- important to avoid deadlocks.
    		IF @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/DialogTimer'
    				OR @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog'
    		BEGIN
    			DELETE FROM [ServiceBrokerConversations]
    				WHERE [Handle] = @handle;
    		END
    
    		IF @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/DialogTimer'
    		BEGIN
    			;SEND ON CONVERSATION @handle
    				MESSAGE TYPE [EndOfStream];
    
    		END
    		ELSE IF @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog'
    		BEGIN
    			END CONVERSATION @handle;
    		END
    		ELSE IF @messageTypeName = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error'
    		BEGIN
    			END CONVERSATION @handle;
    
    			-- We could send a notification or store the error in a table for further inspection
    			DECLARE @error INT;
    			DECLARE @description NVARCHAR(4000);
    			WITH XMLNAMESPACES (N'http://schemas.microsoft.com/SQL/ServiceBroker/Error' AS ssb)
    
    			SELECT
    
    				@error = CAST(@messageBody AS XML).value(
    					'(//ssb:Error/ssb:Code)[1]', 'INT'),
    				@description = CAST(@messageBody AS XML).value(
    					'(//ssb:Error/ssb:Description)[1]', 'NVARCHAR(4000)')
    
    			-- Maybe log to audit log instead?
    			RAISERROR(N'Received error Code:%i Description:"%s"',
    				 16, 1, @error, @description) WITH LOG;
    
    		END;
    	END
    END

    The deadlock XML is:

    <deadlock>
     <victim-list>
      <victimProcess id="process807dbd0c8" />
     </victim-list>
     <process-list>
      <process id="process807dbd0c8" taskpriority="0" logused="0" waitresource="METADATA: database_id = 21 CONVERSATION_GROUP($hash = 0xff26c7e1:0x478840de:0xd403bb)" waittime="2600" ownerId="8333217736" transactionname="GetDialogByHandle" lasttranstarted="2015-03-23T10:53:58.683" XDES="0x87f251c90" lockMode="X" schedulerid="2" kpid="7220" status="suspended" spid="110" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2015-03-23T10:53:58.683" lastbatchcompleted="2015-03-23T10:53:58.683" lastattention="1900-01-01T00:00:00.683" clientapp=".Net SqlClient Data Provider" hostname="COLFOQA2" hostpid="1436" loginname="dev" isolationlevel="read committed (2)" xactid="8333217704" currentdb="21" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
       <executionStack>
        <frame procname="MYDB.dbo.usp_SendMessage" line="116" stmtstart="7540" stmtend="7696" sqlhandle="0x03001500aada77428391a0005da4000001000000000000000000000000000000000000000000000000000000">
    SEND ON CONVERSATION @handle 
    				MESSAGE TYPE @messageType (@xmlPayload);    </frame>
       </executionStack>
       <inputbuf>
    Proc [Database Id = 21 Object Id = 1115151018]   </inputbuf>
      </process>
      <process id="process869a5e558" taskpriority="0" logused="588" waitresource="KEY: 21:72057594039959552 (1f1ae6770d1b)" waittime="2600" ownerId="8333217730" transactionname="user_transaction" lasttranstarted="2015-03-23T10:53:58.683" XDES="0x3e28456a8" lockMode="U" schedulerid="4" kpid="6720" status="background" spid="22" sbid="0" ecid="0" priority="0" trancount="2">
       <executionStack>
        <frame procname="MYDB.dbo.usp_BrokerHandleInitiator" line="28" stmtstart="1996" stmtend="2144" sqlhandle="0x03001500f704cd06e691a0005da4000001000000000000000000000000000000000000000000000000000000">
    DELETE FROM [ServiceBrokerConversations]
    				WHERE [Handle] = @handle;    </frame>
        <frame procname="MYDB.dbo.usp_InitiatorQueueHandler" line="29" stmtstart="1014" stmtend="1172" sqlhandle="0x03001500316f56101694a0005da4000001000000000000000000000000000000000000000000000000000000">
    EXEC dbo.usp_BrokerHandleInitiator @handle, @messageTypeName, @messageBody    </frame>
       </executionStack>
       <inputbuf>
       </inputbuf>
      </process>
     </process-list>
     <resource-list>
      <metadatalock subresource="CONVERSATION_GROUP" classid="$hash = 0xff26c7e1:0x478840de:0xd403bb" dbid="21" id="lock54fdb1800" mode="X">
       <owner-list>
        <owner id="process869a5e558" mode="X" />
       </owner-list>
       <waiter-list>
        <waiter id="process807dbd0c8" mode="X" requestType="wait" />
       </waiter-list>
      </metadatalock>
      <keylock hobtid="72057594039959552" dbid="21" objectname="MYDB.dbo.ServiceBrokerConversations" indexname="PK__ServiceB__877FDFD18DF079BD" id="lock6c65b1a00" mode="U" associatedObjectId="72057594039959552">
       <owner-list>
        <owner id="process807dbd0c8" mode="U" />
       </owner-list>
       <waiter-list>
        <waiter id="process869a5e558" mode="U" requestType="wait" />
       </waiter-list>
      </keylock>
     </resource-list>
    </deadlock>

    I have a clustered index on the fields I am SELECTing by and a UNIQUE index on the Handle (for the DELETE). When running the SELECT/DELETE statements against the table the query plan reports index seeks are being used:

    CREATE TABLE [dbo].[ServiceBrokerConversations] (
        [Identifier]     VARCHAR (50)     NOT NULL,
        [IdentifierType] VARCHAR (50)     NOT NULL,
        [FromService]    [sysname]        NOT NULL,
        [ToService]      [sysname]        NOT NULL,
        [OnContract]     [sysname]        NOT NULL,
        [Handle]         UNIQUEIDENTIFIER NOT NULL,
        [CreateDate]     DATETIME2 (7)    NULL,
        PRIMARY KEY CLUSTERED ([Identifier] ASC, [IdentifierType] ASC, [FromService] ASC, [ToService] ASC, [OnContract] ASC) ON [PRIMARY],
        UNIQUE NONCLUSTERED ([Handle] ASC) ON [PRIMARY]
    ) ON [PRIMARY];

    What appears to be happening is the DELETE is somehow deadlocking with the SEND but I am not sure how since I am using them in the same order in both the send procedure and the activated procedure. Also, RCSI is enabled on the database I am receiving the deadlocks on.

    EDIT:

    I think I have found the culprit with lock acquisition order:

    - In the usp_SendMessage proc:

    • The SELECT locks the conversation record
    • The SEND locks the conversation group

    - In the timer activated proc on the initiator queue:

    • The RECEIVE locks the conversation group
    • The DELETE locks the conversation record

    Given that I think there may be a few solutions:

    1. There is some subtle difference between my code and the code from the article that I am not noticing that when fixed will resolve the deadlocking. I am hoping this is the case since it seems that others have used this pattern without issues as far as I know.
    2. Or...The deadlocking is inherent to the pattern the code is using and I can either:
    • Deal with the deadlocking by adjusting the deadlock priority on the activated stored procedure so that it becomes the victim, and I can implement retry logic.
    • Remove conversation timers and activation all together and resort to some sort of job that expires the conversation by polling it, where I can control the ordering.

    My ultimate goal is to eliminate any deadlocking on usp_SendMessage so that it "never" fails.

    I appreciate any feedback!

    Thanks




    • Edited by ryanmwright Tuesday, March 24, 2015 2:42 PM
    Tuesday, March 24, 2015 12:47 AM

Answers

  • I can understand why the deadlock happens. As you point out the activation procedure and the send SP acquire locks on the resources in reverse order.

    Really why Remus does not consider this in his blog post, I don't know. But may I ask, since you have replaced @@spid as a key with two other columns, does this also mean that multiple processes can use the same conversation? I'm not so sure that this is a good idea. I worked with an SB implementation which reuses conversations some months ago, and I recall that considered channing the pattern, but that I decided against it the end although I don't remember the exact details.

    But so much is clear, if multiple processes can use the same handle, they will serialise on the SELECT with UPDLOCK. That will not happen if you change to REPEATABLEREAD, but I guess they will serialiase on the SEND instead.

    The best way to address the problem appears to use SET LOCK_TIMEOUT 0 in the activation procedure and trap the the timeout error in a CATCH block, and let the message go back to the queue. This should be better than SET DEADLOCK_PRIORITY, since the there will never be a deadlock that upholds the the sender.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by ryanmwright Sunday, March 29, 2015 4:42 PM
    • Unmarked as answer by ryanmwright Sunday, March 29, 2015 4:46 PM
    • Marked as answer by ryanmwright Thursday, April 2, 2015 2:39 AM
    Wednesday, March 25, 2015 10:44 PM
  • I should add the caveat that I have not tested the solution with SET LOCK_TIMEOUT. I was not aware of this deadlock until I saw your post. (I will have to look into this situation for my project.)

    I don't want to venture in whether you should disable poison-message detection, but maybe you should run a stress-test to see how likely it is for the queue to be disabled. Then again, I can't see any problem with a disabling poison-message detection on a queue that only receives timer messages.

    I don't know how you use(d) IdentifierType and Identifier, but I get the feeling that if you share the conversation among processes, there is more likelihood for the deadlock to occur.

    An alternative to the timer is to age out the conversation in the send procedure. That is, if the handle found is too old, send end-of-stream on that conversation, and then get a new handle. But this could leave the conversation around for a long time, if the process goes away.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by ryanmwright Thursday, April 2, 2015 2:39 AM
    Sunday, March 29, 2015 5:34 PM

All replies

  • Hi ryanmwright,

    Thank you for your question.

    I am trying to involve someone more familiar with this topic for a further look at this issue. Sometime delay might be expected from the job transferring. Your patience is greatly appreciated.

    Thank you for your understanding and support.


    Eric Zhang
    TechNet Community Support


    Wednesday, March 25, 2015 4:46 AM
    Moderator
  • I can understand why the deadlock happens. As you point out the activation procedure and the send SP acquire locks on the resources in reverse order.

    Really why Remus does not consider this in his blog post, I don't know. But may I ask, since you have replaced @@spid as a key with two other columns, does this also mean that multiple processes can use the same conversation? I'm not so sure that this is a good idea. I worked with an SB implementation which reuses conversations some months ago, and I recall that considered channing the pattern, but that I decided against it the end although I don't remember the exact details.

    But so much is clear, if multiple processes can use the same handle, they will serialise on the SELECT with UPDLOCK. That will not happen if you change to REPEATABLEREAD, but I guess they will serialiase on the SEND instead.

    The best way to address the problem appears to use SET LOCK_TIMEOUT 0 in the activation procedure and trap the the timeout error in a CATCH block, and let the message go back to the queue. This should be better than SET DEADLOCK_PRIORITY, since the there will never be a deadlock that upholds the the sender.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by ryanmwright Sunday, March 29, 2015 4:42 PM
    • Unmarked as answer by ryanmwright Sunday, March 29, 2015 4:46 PM
    • Marked as answer by ryanmwright Thursday, April 2, 2015 2:39 AM
    Wednesday, March 25, 2015 10:44 PM
  • Hi Erland,

    Thank you for the reply.

    In terms of using something other than the @@spid I think you're right, it should be approached with caution. To keep in line with the pattern I think I will modify the proc so that only the @@spid can be used so that there will be only one conversation per process.

    I like the idea of setting the LOCK_TIMEOUT instead of setting the DEADLOCK_PRIORITY to prevent blocking. If I do I think the transaction will probably be more likely to rollback than with the DEADLOCK_PRIORITY option because it would rollback if that @@spid holds any locks, even in non deadlock scenarios. Since a rollback may be more likely perhaps I should disable the built in poison detection, to prevent the queue from being disabled if it exceeds 5 lock timeouts? (the message sending will be done through triggers so there will be a fairly high volume of them, and a long running process could lock the conversation for enough time for 5 failures to occur)

    If I may ask, when you reused conversations in your other project did you implement something similar when cleaning up conversations?




    Sunday, March 29, 2015 3:52 PM
  • I should add the caveat that I have not tested the solution with SET LOCK_TIMEOUT. I was not aware of this deadlock until I saw your post. (I will have to look into this situation for my project.)

    I don't want to venture in whether you should disable poison-message detection, but maybe you should run a stress-test to see how likely it is for the queue to be disabled. Then again, I can't see any problem with a disabling poison-message detection on a queue that only receives timer messages.

    I don't know how you use(d) IdentifierType and Identifier, but I get the feeling that if you share the conversation among processes, there is more likelihood for the deadlock to occur.

    An alternative to the timer is to age out the conversation in the send procedure. That is, if the handle found is too old, send end-of-stream on that conversation, and then get a new handle. But this could leave the conversation around for a long time, if the process goes away.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by ryanmwright Thursday, April 2, 2015 2:39 AM
    Sunday, March 29, 2015 5:34 PM
  • Hi Erland,

    I have decided to go with your last suggestion which is to age the conversation in the send procedure. After doing this it appears that all deadlocking has stopped.

    Currently my process looks like this:

    • Cleanup the conversation in the usp_SendMessage procedure if it should be aged (DELETE it from the table and send the EndOfStream message, then create a new conversation). I removed the conversation timer from the process since it is not needed anymore.
    • The activated procedure's only job is now to end its side of the conversation when it receives the EndDialog message.
    • Run a nightly job to clean up all the conversations that are left around because the process did not send any new messages (DELETE it from the table and send the EndOfStream message).

    Thank you for the help on this!!



    • Edited by ryanmwright Thursday, April 2, 2015 2:41 AM
    Thursday, April 2, 2015 2:39 AM
  • Ryan,

    Would you be willing to post your solution here? I am working with a solution that was based on Remus Rusanu's posts, and have been searching for a solution to this very problem. Having a working model to compare to would be a huge help!

    Warm regards,

    Clayton

    Thursday, November 9, 2017 3:11 PM