none
Cannot continue the execution because the session is in the kill state RRS feed

  • Question

  • Dear colleagues,

    Our database contains many tables that were created with ANSI_NULLS OFF. I want to change this to ANSI_NULLS ON. Our application runs at 600+ customer sites. A customer is allowed to add customizing tables. The customizing tables also need to be switched to ANSI_NULLS ON.

    At the moment I am testing a SQL script that does the migration for you. The script needs to be as safe as possible.

    I wrote a script that generates the script to do the switch. Basically, I first drop nonclustered indexes, FKs, unique constraints and check constraints, defaults, rename the old table, and the old PK, create a new table with the defaults, a new PK and a new clustered index with the same names as the old ones, use the SWITCH statement to transfer the data to the new table. If this worked, I drop the old table (PK and clustered index will be dropped as a consequence). Then I recreate the nonclustered indexes, FKs, unique constraints, check constraints and also the triggers. It is a bit more complicated, because I also take change tracking, replication, fulltext indexes into account.

    The way I do this is that I first generate the script and store it in a table with an identity column [id] and a varchar(max) column [statement] in a seperate ANSI_Migration database. Every row contains one statement. When the script is complete, I use a cursor (order by [id]) to retreive the statements one by one and execute them with "sp_executesql". The table that stores the script, also has columns [start_execution] and [end_execution]. These are updated before and after the "sp_executesql".

    This process works fine. Large databases are "migrated" to ANSI_NULLS ON for all tables in a couple of minutes. However, in approx. 25% of the cases, the whole procedure fails at a random place (sometimes in the generation phase, sometimes in the execution phase) with the error below.

    Msg 596, Level 21, State 1, Line 71
    Cannot continue the execution because the session is in the kill state.
    Msg 0, Level 20, State 0, Line 71
    A severe error occurred on the current command.  The results, if any, should be discarded.

    This is the only information I get. The line is always 71. Probably this is the line in some internal routine, because the real line where the script fails, is different every time. From the table storing the migration script, I can deduce the phase and the line in my mother script where things went wrong. There is no pattern. Also, the error log contains no information whatsoever. Nothing is logged. Also, the script itself doesn't contain any errors. Otherwise it wouldn't compile and wouldn't work 75% of the time. I have to mention that the script starts with loading a backup of a testing database, so the times it works and the times it fails, it does exactly the same thing on the same data structures and the same data.

    My testing PC uses Microsoft SQL Server 2017 (RTM-CU20) (KB4541283) - 14.0.3294.2 (X64)   Mar 13 2020 14:53:45  Copyright (C) 2017 Microsoft Corporation  Developer Edition (64-bit) on Windows 10 Enterprise 10.0 <X64> (Build 18363: )

    Does anyone have any idea what is the matter? I tried google and found all kinds of answers, most of them not relevant to me. The only one that seems to make sense is one mentioning a memory leak.

    Thanks for helping me (again),

    Chris

    Thursday, May 21, 2020 5:16 AM

Answers

  • Normally when you get that error message there is another message first telling about the terrible accident that happned (access violation, assertion error, or some other internal error in SQL Server). Due to timing that message may not display, but then would not get message 596 I think. (The last message, Msg 0, is not from SQL Server, but from SSMS, so you always get that one.) But you would always find something in the SQL Server errorlog.

    There is another situation when you get message 596, and in that case you get only that message and that is your process is killed with the KILL command. But in that case, too, you can find a message in the SQL Server errorlog. Then again, that is a single line and not equally visible as a stack dump.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    • Proposed as answer by Lily Lii Friday, May 22, 2020 7:41 AM
    • Marked as answer by Chris Sijtsma Saturday, May 23, 2020 8:22 AM
    Thursday, May 21, 2020 8:26 AM

All replies

  • Normally when you get that error message there is another message first telling about the terrible accident that happned (access violation, assertion error, or some other internal error in SQL Server). Due to timing that message may not display, but then would not get message 596 I think. (The last message, Msg 0, is not from SQL Server, but from SSMS, so you always get that one.) But you would always find something in the SQL Server errorlog.

    There is another situation when you get message 596, and in that case you get only that message and that is your process is killed with the KILL command. But in that case, too, you can find a message in the SQL Server errorlog. Then again, that is a single line and not equally visible as a stack dump.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    • Proposed as answer by Lily Lii Friday, May 22, 2020 7:41 AM
    • Marked as answer by Chris Sijtsma Saturday, May 23, 2020 8:22 AM
    Thursday, May 21, 2020 8:26 AM
  • Hi Chris,

    See if it helps: 

    Cannot continue the execution because the session is in the kill state.

    Cannot continue the execution because the session is in the kill state.  building clustered index.

    Best Regards,

    Lily


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com

    Thursday, May 21, 2020 9:42 AM
  • If you haven't already done so, try executing the script using SQLCMD (with the -I argument for QUOTED_IDENTIFERS ON) instead of SSMS. I've seen client tools sometimes behave differently when the unexpected happens and it could be a relevant TDS error message is getting dropped on the floor.

    An XE trace of error_reported might help too.


    Dan Guzman, Data Platform MVP, http://www.dbdelta.com

    Thursday, May 21, 2020 10:30 AM
  • Hi Erland,

    Thank you for your time, again. It is my local development server, so I am sure it isn't because a KILL command has been executed, so it must be one of the internal errors. However, there really is nothing in my error log. At 5:57 I see the messages that tells you the server is started. At 12:46, I see messages that the server has been shut down. This was because of a Windows update. I was just having lunch. Our sysadmin gives us 15 minutes to install a Windows update. If you didn't install the update within these 15 minutes, it is installed anyway. In between those two times, no messages at all. I had the problem multiple times, so there really isn't anything in the error log.

    Also, there isn't any other message before the one that tells something went terribly wrong. But I will try Dan's suggestion on Monday.

    Thanks anyway.


    Friday, May 22, 2020 9:04 AM
  • Hi Lily,

    Thank you for your time. I read both links. Unfortunately, I am sure that none of these reasons has something to do with my case.

    Thank you for trying to help.

    Friday, May 22, 2020 9:07 AM
  • Hi Dan,

    I will try this on Monday, May 25. We are having a long weekend off. 

    Friday, May 22, 2020 9:09 AM
  • I have been thinking what might cause this error. It can't be enough log-space or disk space. There is still 40 GB free disk. The entire generation and execution of the script costs 200 MB of data. Maybe more log space is needed, but all databases have the simple recovery model, so the log should be cleaned regularly without extra disk space being needed. Furthermore, unlimited log growth is permitted. I grow the log in portions of 1 GB to prevent many small interruptions. The log didn't grow beyond its initial 1 GB size. Also, the error log doesn't contain any messages about a log being full. Not in tempdb, neither in the database that stores the migration results nor the database that is migrated.

    The weird thing, in my opinion, is that the process sometimes stops during script generation and sometimes during script execution. During script generation, the only thing that happens is the execution of select statements on system views, and inserts in to the [script] table (with a column [id] bigint identity(1,1) not null primary key and a column [statement] nvarchar(max)).

    The only other thing that comes to mind is that the database that is migrated, has the broker enabled. I will try disabling the broker before migration (or stopping the agent), to see if this has something to do with my problem. Also, I will try Dan's suggestion and use SQLCMD to see if SSMS did sweep error messages under the carpet.


    Friday, May 22, 2020 9:22 AM
  • Thank you for your time, again. It is my local development server, so I am sure it isn't because a KILL command has been executed, so it must be one of the internal errors. However, there really is nothing in my error log. At 5:57 I see the messages that tell you the server is started. At 12:46, I see messages that the server has been shut down. This was because of a Windows update. I was just having lunch. Our sysadmin gives us 15 minutes to install a Windows update. If you didn't install the update within these 15 minutes, It is installed anyway. In between those two times, no messages at all. I had the problem multiple times, so there really isn't anything in the error log.

    Mysterious.

    I would set up a Profiler trace that captures Error:Exception and Error:UserMessage while this is running and see that appears. Maybe there will be some clues.

    And you are sure that you have not been playing with Resource Governor on this server?


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Friday, May 22, 2020 9:26 AM
  • I have been thinking what might cause this error. It can't be enough log-space or disk space. There is still 40 GB free disk. The entire generation and execution of the script costs 200 MB of data. Maybe more log space is needed, but all databases have the simple recovery model, so the log should be cleaned regularly without extra disk space being needed. Furthermore, unlimited log growth is permitted. I grow the log in portions of 1 GB to prevent many small interruptions. The log didn't grow beyond its initial 1 GB size. Also, the error log doesn't contain any messages about a log being full. Not in tempdb, neither in the database that stores the migration results nor the database that is migrated.

    And furthermore, running out of log space results in a level 17 error, if I remember correctly. It is certainly not an error that kills the connection.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Friday, May 22, 2020 11:22 AM
  • Hi Erland,

    You were right after all (again). I stand corrected. It was a KILL command. Moreover, it was a KILL command I wrote myself. Our databases should have the SQL Broker enabled. Some customers load a test database with a backup of the production database every night. I wrote a SQL Agent job that runs every 5 minutes. It checks if a database is an Isah database. If so, it checks if the broker is enabled. If not, it checks if the ID is unique. Depending on the uniqueness either the broker is just enabled or a new GUID is generated. In both cases a variation on the statement below is executed.

    ALTER DATABASE [<Name_of_the_DB>] SET ENABLE_BROKER WITH ROLLBACK IMMEDIATE

    It is the ROLLBACK IMMEDIATE that kills the running process. My simple solution is stopping the agent before migration and restarting it after the migration is done. Migrating a database costs 1 to 2 minutes. Most of the time the migration started end ended before the agent job kicked in, but sometimes I wasn't so lucky.

    Also thanks to you Dan. I started the process with SQLCMD via a script and found out that no error was gobbled up by SSMS. Then I looked at the error log again and saw something I missed before.SQLCMD wasn't the answer to my problem, but it contributed to me taking a closer look. At the exact moment of my migration process stopping, there was the message below in the error log. 

    "Setting database option ENABLE_BROKER to ON for database '<Name_of_the_DB>'."

    This problem looked random, but as it is so often, after you find the cause, the solution is obvious.

    Thank you again


    Saturday, May 23, 2020 8:35 AM
  • Glad to hear that you found the reason! It was kind of evident that it had to be something that activtely evicted the process. But, you know, when you said when there was nothing in the errorlog, I had to trust you. But all these years in forums is that you should never trust statements like that.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Saturday, May 23, 2020 9:06 AM
  • Hi Erland, Dan, and Lily

    I really do not understand why I missed that message in the error log. I looked at the error logs of last Wednesday again, just now, and the ENABLE BROKER WITH ROLLBACK IMMEDIATE is there. I have overlooked it. I really am sorry to have send you all on this wild goose chase. It really is a stupid mistake. Before working with MS SQL Server, we used to work with Sybase. When working with Sybase, it is even more important to read the error logs, so I cannot plead ignorance. And although I am not such a SQL crack as you guys, I like to believe I know some things. I have been working with databases for over 20 years now. And I really did read the error log, but I just managed to miss the one line that was important. The only reason I can think up is that reading is hard. Silly excuse and silly me, I know.

    Sorry for wasting your time and thank you very much for helping.

    Chris

    Saturday, May 23, 2020 7:26 PM
  • Reading is hard? That has never happened to me! Or, well maybe once. Or twice. Or...

    This is what these forums are for. After all, none of us spent a whole lot of time on this, but we gave you some information to help you to narrow down the problem. We have all been there: we have had a problem we couldn't figure out, and we needed someone to look at it with fresh eyes. Or just to talk to someone else which required us to explain it a little more closer - which by itself led us to understand what the real problem was.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

    Saturday, May 23, 2020 9:15 PM
  • The "reading is hard" part was meant as a kind of lame joke.

    When I was still studying, I had to write an Algol 68 program that could differentiate a mathematical formula. You could pick up the printed compiler output + input data + output data from a set of pidgin holes at the outside of the gym like hall that was the main frame computer. The error messages were not as good as they are now, but still the problem should have been obvious.

    The procedure declaration that caused the error (with error) looked something like this

      differentiate (ref exp formula) formula
      ^
      syntax error

    I forgot to start my procedure with the keyword proc but I just didn't see it. It took somebody else to point out the obvious error to me.

    Sunday, May 24, 2020 11:09 AM