none
Stopping a run-away service broker process

    Question

  •  

    I have a service broker process that's in run-away mode and I cannot figure out how to stop it.

     

    Here's the situation:

     

    The stored procedure named in the activation queue is opening a transaction but not closing it.  The receive loop has a timeout of 3 seconds, with MAX_QUEUE_READERS set to 5 so every 3 seconds I'm getting 15 new records in

    sys.dm_tran_locks.

     

    The server is at 99% CPU with no users on it.

     

     

    The problem is I can't stop it.  Here's what I have tried:

     

    Reboot the box                                                                          Clears the lock table and runs away again

    ALTER QUEUE <Queue Name> WITH STATUS=OFF                  Just sits there forever

    Drop the Service, Drop the Queue                                               Just sits there forever

    Dropped the stored procedure itself (!)                                         It's still running like crazy

     

    So my immediate question is how do I stop this crazy thing?  And the follow up question is when a programmer makes a mistake, do we have to reformat our DEV box to recover from it (that's a joke) -- seriously how do you stop a run away service broker process?

     

     

     

    Wednesday, March 26, 2008 1:46 PM

All replies

  • I've discovered if I kill the SPID that is holding the locks and immediately stop the queue I can get control of the server back. 

     

    Here's what it seems to come down to:

     

    On SQL Server Developer edition where the service broker code was developed it works fantastic.

     

    On the development SERVER which is Enterprise Edition the activation procedure generates thousands of locks per minute, and consumes 100% of the CPU.

     

    Both instances are Service Pack 2 Hot Fix 3175.

     

     

     

     

    Wednesday, March 26, 2008 2:44 PM
  • Here's more information:

     

    Set the MAX_QUEUE_READERS to 1 in order to slow down the runaway condition.

    Setup a trace table in the database (A simple table we can write to from the Activation Sproc to see what is happening)

     

    Start the queue (ALTER QUEUE <QueueName> WITH STATUS=ON)

     

    Within five seconds we have 28,000+ locks in dm_tran_locks

     

    Using the profiler on the SPID we see thousands of SQL transactions that begin & commit at the exact same time with an object name of TVQuery

     

    The Activation sproc trace output indicates that it has been run three times and has received no messages.

     

    The Activation sproc is never run again -- there is no trace output... But the sys.dm_tran_locks table grows by about 20 locks a second until you kill the spid and stop the queue.

     

    A few rounds of stopping / starting the queue(s) and the Activation sproc doesn't run at all, but the lock table still explodes when it starts, so we reboot the box and we are back at square one.

     

    During this time we are not initiating any conversations or attempting to exchange any messages.

     

    On SQL Server Developer Edition the Service Broker works Exactly As Advertised, no locks are created.

     

    Our activation sproc looks like all the examples out there, very very simple and basic.

     

    Any ideas/help/clues would be greatly appreciated as we cannot figure out what on earth we are doing that is generating 28,000 locks within seconds of enabling the queue.

     

    Wednesday, March 26, 2008 3:49 PM
  • So lets say you write a procedure that begins a transaction, updates a table then enters an infinite loop. Your practical joke friend connects with his favorite SSMS tool and execute the procedure. You will see that it grabs some locks then 'runaways' with, say, 90% CPU consumption. So what you do:
    - restart the box. before you go and look, your friend connects back and executes the procedure again, so you find the same situation
    - try to drop the table being updated. Of course, the uncommitted updates have placed a schema stability lock and some intent-update locks on the table and your DROP requires a schema modification lock, so they are in conflict so your DROP 'just sits there forever'.
    - you can drop the procedure being executed, but that doesn't prevent the execution of the procedure in the connection to continue.

    So all that is the normal expected behavior with SQL Server, T-SQL and poorly written procedures. Now you can throw in some esoteric and say the table is a 'queue' and replace your evil friend with internal activation and then blame the new mysterious Service Broker, but that doesn't change a thing. What you describe is exactly the expected behavior.
    Thursday, March 27, 2008 9:40 AM
  • SQL Server Developer Edition and Enterprise Editions are identical. That is unless you have different versions/service packs/cumulative updates applied on development environment vs. production environment. 

    The TVQuery object is some internal mechanism related to the use of table variables.
    Thursday, March 27, 2008 9:50 AM
  • So what you're telling me is that if I write this program on computer "A":

     

    for(int inx = 0; inx < 10; inx++) Console.WriteLine(Convert.ToString(inx));

     

    And it prints the numbers one through 10, and then I run it on computer "B" and it prints nothing...

     

    THAT MY CODE IS POORLY WRITTEN???  Are you pulling my leg? 

     

    Do you think the problem could be in the CONFIGURATION OF COMPUTER "B"??????? 

     

    ... Now IMHO if I stop a queue it should stop.  That's what happens with a spooler queue, I don't have to kill the process to stop the printer from printing.  It shouldn't go away forever if it can't stop the queue, it should come back and tell me WHY the queue cannot be stopped.  That is poor programming.  If you have a deadlock, you don't lock up and go away, you TRY to enter a critical section, and if you can't get in you throw an error. 

     

    But that's not the point.  The point is this runs on one computer and not on another, and they are on the same patch level, one is XP with Dev Edition one is Windows 2003 Server with Ent Edition.

     

    Here's the activation sproc.  This seems very, very, very simple to me.  On machine "A" this runs perfectly.  On machine "B" you get three records in the trace table I setup followed by an endlessly growing lock table.  Now I'm usually not one to complain but the documentation on this is not all that extensive and I have absolutely no idea where to LOOK for what is wrong.  I don't think it has anything to do with the code at all.

     

    GO
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    --
    -- MessageProcessor_NameChange
    --
    -- This routine accepts an XML fragment from two different triggers
    -- and updates the NameSearch table, that is used for fuzzy lookup
    --
    ALTER PROCEDURE [dbo].[MessageProcessor_NameChange] AS
    BEGIN
     DECLARE @MsgHwnd   UniqueIdentifier,
       @MessageType  int,
       @Error   int,
       @ErrorText  NVarChar(100),
       @Message  XML;


     WHILE (1 = 1)
       BEGIN
      INSERT INTO dbo.TraceTable (TraceInfo, Stamp) VALUES ('Hello here we are', GETDATE());

      BEGIN TRANSACTION;
     
      WAITFOR
      (
       RECEIVE TOP (1) @MessageType = MESSAGE_TYPE_ID
                                  ,@Message        = MESSAGE_BODY
                                  ,@MsgHwnd      = CONVERSATION_HANDLE
       FROM NameChangeQueue
      ), TIMEOUT 3000;
      -- Didn't get anything in three seconds, exit
      IF (@@ROWCOUNT = 0)
        BEGIN
    *** When I enable the queues I get this message three times and then never again, but only
    *** on machine "B" not on machine "A"
       COMMIT TRANSACTION;
       INSERT INTO dbo.TraceTable (TraceInfo, Stamp) VALUES ('Didnt get nuttin', GETDATE());
       RETURN;
               END
        --
        -- If there's an error send the message back to the caller
        -- this keeps it out of the dead letter queue
        -- and serves as the NAK for this message.  We also log
        -- this in the error log on our side
        --
        SET @Error = @@ERROR;
        IF (@Error <> 0)
          BEGIN
                END CONVERSATION @MsgHwnd WITH ERROR = @Error DESCRIPTION = ' Sproc Error!';
               COMMIT TRANSACTION;
               INSERT INTO dbo.SystemErrors(Operator
                                                             ,Source
                                                             ,Date
                                                            ,ErrorText)
                   VALUES ('MessageProcessor_NameChange'
                                 ,'MessageProcessor_NameChange'
                                ,GETDATE()
                               ,CAST(@Error AS VarChar(5)) + @ErrorText);
                  RETURN;
        END
       ELSE -- An error did not occur we have something to do
        IF (@MessageType <> 2)  -- If NOT the End Dialog Message
          BEGIN
                DECLARE @AckMsg XML;
                SET @AckMsg = N'<Ack>GotMessage</Ack>';
               SEND ON CONVERSATION @MsgHwnd MESSAGE TYPE AckMsg (@AckMsg);
               END CONVERSATION @MsgHwnd;
               --
               -- if the message looks like it's formatted ok process it
               --
              IF (@Message.exist('//Names') = 1)
               BEGIN
               --
               -- Transmogrify the message XML document into a local table
               --
              DECLARE @Actions TABLE (ID  BigInt
                                                        ,[Name]  NVarChar(256)
                                                        ,Source  NVarChar(64)
                                                        ,[Action] VarChar(3));

               INSERT INTO @Actions (ID
                                                   ,[Name]
                                                   ,Source
                                                    ,[Action])
                SELECT Action.value('@ID',             'BigInt')
                             Action.value('@Name',        'VarChar(256)')
                            ,Action.value('@Source',       'NVarChar(64)')
                            ,Action.value('@Action',        'VarChar(3)')
               FROM    @Message.nodes('//Name') node([Action]);
               --
               -- Iterate over it and process the actions
               --
              DECLARE ActionsCur CURSOR LOCAL FORWARD_ONLY READ_ONLY FOR SELECT ID
                                                                                                                                               ,[Name]
                                                                                                                                               ,Source
                                                                                                                                               ,[Action]
                                                                                                                                    FROM @Actions;
              DECLARE @ID  BigInt;
              DECLARE @Name NVarChar(256);
              DECLARE @Source NVarChar(64);
              DECLARE @Action VarChar(3);
             OPEN ActionsCur;
             FETCH NEXT FROM ActionsCur INTO @ID
                                                                    ,@Name
                                                                    ,@Source
                                                                    ,@Action;
              WHILE (@@FETCH_STATUS <> -1)
               BEGIN
                 IF (@Action = 'ADD')
                           INSERT INTO dbo.NameSearch (Name_ID
                                                                          ,[Text]
                                                                          ,Source
                                                                          ,Created_By)
                           VALUES       (@ID
                                               ,@Name
                                               ,@Source
                                               ,'MessageProcessor_NameChange');

                 IF (@Action = 'UPD')
                           UPDATE dbo.NameSearch
                                  SET [Text] = @Name
                           WHERE  Name_ID = @ID
                                AND  Source  = @Source;

     

                IF (@Action = 'DEL')
                          DELETE FROM dbo.NameSearch
                          WHERE  Name_ID = @ID
                               AND  Source  = @Source
                               AND  [Text]  = @Name;

     

                 FETCH NEXT FROM ActionsCur INTO  @ID
                                                                         ,@Name
                                                                         ,@Source
                                                                         ,@Action;
                         END -- Next Name Change Record
                  CLOSE  ActionsCur;
                  DEALLOCATE ActionsCur;
                 END -- It's the Name change message
          END -- MessageType = 2
        ELSE
           END CONVERSATION @MsgHwnd;
          COMMIT TRANSACTION;
      END -- While 1 = 1
    END --

    Thursday, March 27, 2008 1:21 PM
  • So you have a procedure that never commits until it hits the end of the queue and wonder why it behaves differently in production than in development. Could it be that in production it has a pretty big queue to drain, a queue that keeps growing and never shrinks?

    Fix your code to commit from time to time.

    Also your code is has other problems, the COMMIT for case 'MessageType = 2' will cause the loop to continue but there is no BEGIN TRANSACTION so you'll have misbalanced commit/begin. Also you need to take care of the error message types. Using message type codes is not guaranteed between releases, use message type names instead.

    As for your wish for DDL statements to ignore locks and succeed despite other transactions interacting with the object being altered/dropped, luckily the server implements the correct semantics.

    Thursday, March 27, 2008 2:48 PM
  • I copied the framework for this off the examples on the internet of which there are many -- they are all basically the same example.  That doesn't mean they are RIGHT they are just all we have to go on.

     

    Perhaps you could point me to the extensive and detailed examples in the HELP supplied with the product that I cannot find.  Otherwise I would not use the Internet for help at all. 

     

    This is not in PROD this is on a DEV box.  It was developed on a developer's machine and then moved to the development server to test it.  No transactions have yet to be put through it at all. Not a single one. We installed this

    in the DEV environment and it went crazy.  We had a heck of a time stopping it.  End of story.

     

    I still have no idea how to stop a service broker activation procedure that is not behaving well.  How do I pause the thing?  Stop the thing?  Is my only option to KILL the SPID that's running?  That does not seem like good advice to anyone.  What if I want to temporarily suspend processing in order to perform some maintenance activity?  How do I do that?

     

    Is there a GOOD EXAMPLE you can reccomend that we use as a template?  The one in BOL didn't teach us very much at all, and the ones out there on the Internet are apparently not very well written either.  And we still don't understand why this runs on the developers box.

     

    It's not the technologies fault, I'm sure it works great if it's done right.

     

     

    Thursday, March 27, 2008 3:17 PM