I have a service broker process that's in run-away mode and I cannot figure out how to stop it.
Here's the situation:
The stored procedure named in the activation queue is opening a transaction but not closing it. The receive loop has a timeout of 3 seconds, with MAX_QUEUE_READERS set to 5 so every 3 seconds I'm getting 15 new records in
The server is at 99% CPU with no users on it.
The problem is I can't stop it. Here's what I have tried:
Reboot the box Clears the lock table and runs away again
ALTER QUEUE <Queue Name> WITH STATUS=OFF Just sits there forever
Drop the Service, Drop the Queue Just sits there forever
Dropped the stored procedure itself (!) It's still running like crazy
So my immediate question is how do I stop this crazy thing? And the follow up question is when a programmer makes a mistake, do we have to reformat our DEV box to recover from it (that's a joke) -- seriously how do you stop a run away service broker process?
I've discovered if I kill the SPID that is holding the locks and immediately stop the queue I can get control of the server back.
Here's what it seems to come down to:
On SQL Server Developer edition where the service broker code was developed it works fantastic.
On the development SERVER which is Enterprise Edition the activation procedure generates thousands of locks per minute, and consumes 100% of the CPU.
Both instances are Service Pack 2 Hot Fix 3175.
Here's more information:
Set the MAX_QUEUE_READERS to 1 in order to slow down the runaway condition.
Setup a trace table in the database (A simple table we can write to from the Activation Sproc to see what is happening)
Start the queue (ALTER QUEUE <QueueName> WITH STATUS=ON)
Within five seconds we have 28,000+ locks in dm_tran_locks
Using the profiler on the SPID we see thousands of SQL transactions that begin & commit at the exact same time with an object name of TVQuery
The Activation sproc trace output indicates that it has been run three times and has received no messages.
The Activation sproc is never run again -- there is no trace output... But the sys.dm_tran_locks table grows by about 20 locks a second until you kill the spid and stop the queue.
A few rounds of stopping / starting the queue(s) and the Activation sproc doesn't run at all, but the lock table still explodes when it starts, so we reboot the box and we are back at square one.
During this time we are not initiating any conversations or attempting to exchange any messages.
On SQL Server Developer Edition the Service Broker works Exactly As Advertised, no locks are created.
Our activation sproc looks like all the examples out there, very very simple and basic.
Any ideas/help/clues would be greatly appreciated as we cannot figure out what on earth we are doing that is generating 28,000 locks within seconds of enabling the queue.
So lets say you write a procedure that begins a transaction, updates a table then enters an infinite loop. Your practical joke friend connects with his favorite SSMS tool and execute the procedure. You will see that it grabs some locks then 'runaways' with, say, 90% CPU consumption. So what you do:- restart the box. before you go and look, your friend connects back and executes the procedure again, so you find the same situation- try to drop the table being updated. Of course, the uncommitted updates have placed a schema stability lock and some intent-update locks on the table and your DROP requires a schema modification lock, so they are in conflict so your DROP 'just sits there forever'.- you can drop the procedure being executed, but that doesn't prevent the execution of the procedure in the connection to continue.So all that is the normal expected behavior with SQL Server, T-SQL and poorly written procedures. Now you can throw in some esoteric and say the table is a 'queue' and replace your evil friend with internal activation and then blame the new mysterious Service Broker, but that doesn't change a thing. What you describe is exactly the expected behavior.
SQL Server Developer Edition and Enterprise Editions are identical. That is unless you have different versions/service packs/cumulative updates applied on development environment vs. production environment.The TVQuery object is some internal mechanism related to the use of table variables.
So what you're telling me is that if I write this program on computer "A":
for(int inx = 0; inx < 10; inx++) Console.WriteLine(Convert.ToString(inx));
And it prints the numbers one through 10, and then I run it on computer "B" and it prints nothing...
THAT MY CODE IS POORLY WRITTEN??? Are you pulling my leg?
Do you think the problem could be in the CONFIGURATION OF COMPUTER "B"???????
... Now IMHO if I stop a queue it should stop. That's what happens with a spooler queue, I don't have to kill the process to stop the printer from printing. It shouldn't go away forever if it can't stop the queue, it should come back and tell me WHY the queue cannot be stopped. That is poor programming. If you have a deadlock, you don't lock up and go away, you TRY to enter a critical section, and if you can't get in you throw an error.
But that's not the point. The point is this runs on one computer and not on another, and they are on the same patch level, one is XP with Dev Edition one is Windows 2003 Server with Ent Edition.
Here's the activation sproc. This seems very, very, very simple to me. On machine "A" this runs perfectly. On machine "B" you get three records in the trace table I setup followed by an endlessly growing lock table. Now I'm usually not one to complain but the documentation on this is not all that extensive and I have absolutely no idea where to LOOK for what is wrong. I don't think it has anything to do with the code at all.
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
-- This routine accepts an XML fragment from two different triggers
-- and updates the NameSearch table, that is used for fuzzy lookup
ALTER PROCEDURE [dbo].[MessageProcessor_NameChange] AS
DECLARE @MsgHwnd UniqueIdentifier,
WHILE (1 = 1)
INSERT INTO dbo.TraceTable (TraceInfo, Stamp) VALUES ('Hello here we are', GETDATE());
RECEIVE TOP (1) @MessageType = MESSAGE_TYPE_ID
,@Message = MESSAGE_BODY
,@MsgHwnd = CONVERSATION_HANDLE
), TIMEOUT 3000;
-- Didn't get anything in three seconds, exit
IF (@@ROWCOUNT = 0)
*** When I enable the queues I get this message three times and then never again, but only
*** on machine "B" not on machine "A"
INSERT INTO dbo.TraceTable (TraceInfo, Stamp) VALUES ('Didnt get nuttin', GETDATE());
-- If there's an error send the message back to the caller
-- this keeps it out of the dead letter queue
-- and serves as the NAK for this message. We also log
-- this in the error log on our side
SET @Error = @@ERROR;
IF (@Error <> 0)
END CONVERSATION @MsgHwnd WITH ERROR = @Error DESCRIPTION = ' Sproc Error!';
INSERT INTO dbo.SystemErrors(Operator
,CAST(@Error AS VarChar(5)) + @ErrorText);
ELSE -- An error did not occur we have something to do
IF (@MessageType <> 2) -- If NOT the End Dialog Message
DECLARE @AckMsg XML;
SET @AckMsg = N'<Ack>GotMessage</Ack>';
SEND ON CONVERSATION @MsgHwnd MESSAGE TYPE AckMsg (@AckMsg);
END CONVERSATION @MsgHwnd;
-- if the message looks like it's formatted ok process it
IF (@Message.exist('//Names') = 1)
-- Transmogrify the message XML document into a local table
DECLARE @Actions TABLE (ID BigInt
INSERT INTO @Actions (ID
SELECT Action.value('@ID', 'BigInt')
FROM @Message.nodes('//Name') node([Action]);
-- Iterate over it and process the actions
DECLARE ActionsCur CURSOR LOCAL FORWARD_ONLY READ_ONLY FOR SELECT ID
DECLARE @ID BigInt;
DECLARE @Name NVarChar(256);
DECLARE @Source NVarChar(64);
DECLARE @Action VarChar(3);
FETCH NEXT FROM ActionsCur INTO @ID
WHILE (@@FETCH_STATUS <> -1)
IF (@Action = 'ADD')
INSERT INTO dbo.NameSearch (Name_ID
IF (@Action = 'UPD')
SET [Text] = @Name
WHERE Name_ID = @ID
AND Source = @Source;
IF (@Action = 'DEL')
DELETE FROM dbo.NameSearch
WHERE Name_ID = @ID
AND Source = @Source
AND [Text] = @Name;
FETCH NEXT FROM ActionsCur INTO @ID
END -- Next Name Change Record
END -- It's the Name change message
END -- MessageType = 2
END CONVERSATION @MsgHwnd;
END -- While 1 = 1
So you have a procedure that never commits until it hits the end of the queue and wonder why it behaves differently in production than in development. Could it be that in production it has a pretty big queue to drain, a queue that keeps growing and never shrinks?Fix your code to commit from time to time.Also your code is has other problems, the COMMIT for case 'MessageType = 2' will cause the loop to continue but there is no BEGIN TRANSACTION so you'll have misbalanced commit/begin. Also you need to take care of the error message types. Using message type codes is not guaranteed between releases, use message type names instead.As for your wish for DDL statements to ignore locks and succeed despite other transactions interacting with the object being altered/dropped, luckily the server implements the correct semantics.
I copied the framework for this off the examples on the internet of which there are many -- they are all basically the same example. That doesn't mean they are RIGHT they are just all we have to go on.
Perhaps you could point me to the extensive and detailed examples in the HELP supplied with the product that I cannot find. Otherwise I would not use the Internet for help at all.
This is not in PROD this is on a DEV box. It was developed on a developer's machine and then moved to the development server to test it. No transactions have yet to be put through it at all. Not a single one. We installed this
in the DEV environment and it went crazy. We had a heck of a time stopping it. End of story.
I still have no idea how to stop a service broker activation procedure that is not behaving well. How do I pause the thing? Stop the thing? Is my only option to KILL the SPID that's running? That does not seem like good advice to anyone. What if I want to temporarily suspend processing in order to perform some maintenance activity? How do I do that?
Is there a GOOD EXAMPLE you can reccomend that we use as a template? The one in BOL didn't teach us very much at all, and the ones out there on the Internet are apparently not very well written either. And we still don't understand why this runs on the developers box.
It's not the technologies fault, I'm sure it works great if it's done right.