none
Bewildering Thread Abort RRS feed

  • Question

  • I'm having great difficulty with catching ThreadAbortException and don't know why, this is in SQL/CLR code.

    Please have a look at this thread.

    The MSDN docs also say that finally blocks are executed when a thread is aborted, but I see this failing too.

    Is this a bug in .Net itself?

     

    Thanks

    Cap'n

     

     

     

    Friday, November 18, 2011 9:53 PM

All replies

  • Given that code, my guess would be that after the ThreadAbortException as thrown another exception was thrown in a catch block or finally block.
    Friday, November 18, 2011 10:12 PM
  • Given that code, my guess would be that after the ThreadAbortException as thrown another exception was thrown in a catch block or finally block.


    Do you mean the first or second snippet I posted Jared?

    Also I have a breakpoint set on that line in the catch block - and that doesn't get hit (second snippet).

    Also read this incredible post, none of this makes sense and looks either buggy or at least poor documentation for not explaining all this.

    Thx

    Cap'n

     


    Saturday, November 19, 2011 1:44 PM
  • The incredible post looks like the behavior I'm used to seeing. If an exception get's thrown in the finally block, that's the exception which "bubbles up", not the original exception. Personally I think there's a better solution to the problem, but that's the current behavior of .Net.
    Saturday, November 19, 2011 6:28 PM
  • The incredible post looks like the behavior I'm used to seeing. If an exception get's thrown in the finally block, that's the exception which "bubbles up", not the original exception. Personally I think there's a better solution to the problem, but that's the current behavior of .Net.


    But why is the ThreadAbort not caught here:

    try {
        try { }
        finally {
            // critical logic
        }
    } catch(Exception ex) {
        // ThreadAbortException is not caught here, but exceptions thrown
        // from within the critical logic are
    }

     

    Saturday, November 19, 2011 7:54 PM
  • That's the way .Net works. Nothing special about ThreadAbortException here. If an exception gets thrown while another exception is unwinding the stack, the second exception takes over in unwinding the stack.
    Saturday, November 19, 2011 8:48 PM
  • That's the way .Net works. Nothing special about ThreadAbortException here. If an exception gets thrown while another exception is unwinding the stack, the second exception takes over in unwinding the stack.


    Not in an SQL/CLR proc or trigger though Jared - my original post is about that.

    If a user presses cancel for a query in Sql Mgmt Studio and the query is inside user code and that code holds one or more syncblocks at that instant in time, then the exception is never caught, the thread is killed period.

    No finally or catch blocks get executed and the AppDomain itself is forcibly unloaded possibly impacting other users, this is the issue - standard exception handling rules are not followed.

    Cap'n

     




    Sunday, November 20, 2011 3:34 AM
  • There's nothing bewildering about this. If your thread holds a lock while it is aborted then sql clr considers that your thread is accessing shared state. In such a situation there's no way to know that abort won't corrupt that shared state, sql clr assumes the worst and throws the app domain away.
    Sunday, November 20, 2011 8:45 AM
    Moderator
  • There's nothing bewildering about this. If your thread holds a lock while it is aborted then sql clr considers that your thread is accessing shared state. In such a situation there's no way to know that abort won't corrupt that shared state, sql clr assumes the worst and throws the app domain away.


    I have to say this IS bewlidering if it is not documented and I can find no documentation that says what you do above.

    Here is what it says

    "When managed code in the .NET Framework APIs encounters critical exceptions, such as out-of-memory or stack overflow, it is not always possible to recover from such failures and ensure consistent and correct semantics for their implementation. These APIs raise a thread abort exception in response to these failures.

    When hosted in SQL Server, such thread aborts are handled as follows: the CLR detects any shared state in the application domain in which the thread abort occurs. The CLR does this by checking for the presence of synchronization objects. If there is shared state in the application domain, then the application domain itself is unloaded. The unloading of the application domain stops database transactions that are currently running in that application domain. Because the presence of shared state can widen the impact of such critical exceptions to user sessions other than the one triggering the exception, SQL Server and the CLR have taken steps to reduce the likelihood of shared state. For more information, see the .NET Framework documentation."

    This clearly says that a thread abort will be thrown - then a check is made for any locks- but it doesn't throw a thread abort if locks are held - it just unloads the AppDomain - this prevents you from unlocking a lock! One overload of Monitor.Enter provides a ref flag for this purpose.

    It also says nothing about a query cancellation leading to a thread abort or domain unload, the connection between a cancellation and this drastic behaviour is not in any documentation.

    So, IF a thread has a lock AND the user cancels the operation (say in SSMS) then NO thread abort is thrown BUT the app domain is unloaded - this is verified by tetsing BUT is not documented anywhere - so it is bewildering surely?

    Cap'n

     


    Sunday, November 20, 2011 1:27 PM
  • "I have to say this IS bewlidering if it is not documented and I can find no documentation that says what you do above."

    What do you mean by "it's not documented"? The documentation you point to says pretty much the same thing:

    "The CLR does this by checking for the presence of synchronization objects. If there is shared state in the application domain, then the application domain itself is unloaded. "

    Well, it has a little error, the simple presence of synchronization objects doesn't mean anything. A synchronization object must be held by the thread for shared state access to be assumed. But this is common knowledge.

    " it just unloads the AppDomain - this prevents you from unlocking a lock! "

    Umm, unloading the app domain will release the locks anyway, that's part of the reason why the app domain is unloaded.

    "It also says nothing about a query cancellation leading to a thread abort or domain unload, the connection between a cancellation and this drastic behaviour is not in any documentation."

    Indeed, I haven't seen this explained anywhere but maybe I'm missing something, I'm far from a SQL CRL expert. Though I'm not very surprised by this behavior. Since SQL Server is running user code when the cancellation request occurs there's not much it can do to fulfill the request.

    Anyway, my comment was about the abort that leads to app domain unload. The unload is required, it's the only way for SQL Server to preserve its own integrity in this situation.

     


    Sunday, November 20, 2011 3:36 PM
    Moderator
  • "I have to say this IS bewlidering if it is not documented and I can find no documentation that says what you do above."

    What do you mean by "it's not documented"? The documentation you point to says pretty much the same thing:

    "The CLR does this by checking for the presence of synchronization objects. If there is shared state in the application domain, then the application domain itself is unloaded. "

    Well, it has a little error, the simple presence of synchronization objects doesn't mean anything. A synchronization object must be held by the thread for shared state access to be assumed. But this is common knowledge.

    " it just unloads the AppDomain - this prevents you from unlocking a lock! "

    Umm, unloading the app domain will release the locks anyway, that's part of the reason why the app domain is unloaded.

    "It also says nothing about a query cancellation leading to a thread abort or domain unload, the connection between a cancellation and this drastic behaviour is not in any documentation."

    Indeed, I haven't seen this explained anywhere but maybe I'm missing something, I'm far from a SQL CRL expert. Though I'm not very surprised by this behavior. Since SQL Server is running user code when the cancellation request occurs there's not much it can do to fulfill the request.

    Anyway, my comment was about the abort that leads to app domain unload. The unload is required, it's the only way for SQL Server to preserve its own integrity in this situation.

     



    Hi Mike

    I do appreciate you taking the time to answer, and we are also trying to make sense of this as a simple user cancel can sometime unload the app domain and screw all other users that are running stuff in that app domain, it is frankly poor design to allow a simple thing like a cancel to lead to such a dramatic failure for multiple users.

    It could have a flag in SqlContext and just set that to true if cancel was requested, user code could then be given a grace-time to handle this and exit cleanly - else then take more drastic action, that's far better that what is done now.

    But also the documentation does state that a thread abort will occur THEN an app domain unload, so fram that we'd expect to always see a thread abort exception, but of course we don't because the documentation isn't right so far as I can see.

    If thread abort was always thrown then we could always catch it and perhaps cancel it even or do some basic cleanup, but there is no opportunity whatsoever for application code to do anythiung at all if cancel is pressed and the cancelled thread happens to have some simple lock.

    1. Issue thread abort.

    2. Give grace time of x secs or millisecs

    3. Check if thread still has locks, if so unload domain.

    So my big gripe is that the docs imply that thread abort is always thrown, possibly followed by a domain unload, but in reality this isnt true.

     

    Cap'n

     

     


    Sunday, November 20, 2011 3:52 PM
  • "It could have a flag in SqlContext and just set that to true if cancel was requested, ..."

    Indeed, this looks like a potential improvement but I'm not familiar enough with SQL to say why it's not done like this. I'm curious what the SQL CLR people will say about this. My guess is that SQL simple doesn't trust user code enough to wait for it to exit cleanly.

    "But also the documentation does state that a thread abort will occur THEN an app domain unload..."

    Indeed, the docs seem poorly worded.

    "If thread abort was always thrown then we could always catch it and perhaps cancel it even or do some basic cleanup..."

    Hmm, cleaning up after an abort, that's basically impossible to do right. And if and app domain unload follows then there's really no reason to do it.

    In short, once abort happens there's nothing you can do. The only real problem here is if abort should be the primary way used to implement cancellation.

    Sunday, November 20, 2011 6:36 PM
    Moderator
  • "It could have a flag in SqlContext and just set that to true if cancel was requested, ..."

    Indeed, this looks like a potential improvement but I'm not familiar enough with SQL to say why it's not done like this. I'm curious what the SQL CLR people will say about this. My guess is that SQL simple doesn't trust user code enough to wait for it to exit cleanly.

    "But also the documentation does state that a thread abort will occur THEN an app domain unload..."

    Indeed, the docs seem poorly worded.

    "If thread abort was always thrown then we could always catch it and perhaps cancel it even or do some basic cleanup..."

    Hmm, cleaning up after an abort, that's basically impossible to do right. And if and app domain unload follows then there's really no reason to do it.

    In short, once abort happens there's nothing you can do. The only real problem here is if abort should be the primary way used to implement cancellation.


    Well assuming cleanup is hard is not really valid, it can be I agree but in our case it is not, we know the thread is being killed - we accept that - yes we have a few locks mostly though these are used inside a comms channel object that handles async TCP IO to an esternal (non-SQL) server.

    So just unlocking these and exiting is a decent approach even if we leave something corrupt (a list or queue) because we can simply destroy all of these resources before the thread exits, we have locks for stuff but these are NEVER between different SQL/CLR connection threads only between a connection thread and s few timer or io callback threads.

    The MS assumption is fatally flawed, they assume that using locks means we are sharing data between CLR connection threads, but we are not, we never do (we have few simple static caches and these are not a major worry, we can redo that bit).

    Cap'n

     

     

    Sunday, November 20, 2011 10:11 PM
  • I would expect that should the ThreadAbortException bubble up out of your applications code that the behavior of unloading the AppDomain would occur. I image that's where the logic for doing so is. But the code in the finally blocks in your examples is taking over the unwinding of the stack, so the ThreadAbortException never makes it to the handleing code which would unload the AppDomain.

    Monday, November 21, 2011 1:39 AM
  • "The MS assumption is fatally flawed, they assume that using locks means we are sharing data betweenCLR connection threads, but we are not"

    It's not flawed, it's the best that can be done given the circumstances, it's not like CLR can figure out what your code is doing. Sure, there maybe cases like yours where cleanup may be possible but SQL Server can't detect such cases and the alternative of letting user threads go wild inside the server is not exactly wise.

    Documentation related: this paints a clearer picture of what's going on during abort: http://msdn.microsoft.com/en-us/magazine/cc163716.aspx#S9

    Monday, November 21, 2011 11:29 AM
    Moderator
  • "The MS assumption is fatally flawed, they assume that using locks means we are sharing data betweenCLR connection threads, but we are not"

    It's not flawed, it's the best that can be done given the circumstances, it's not like CLR can figure out what your code is doing. Sure, there maybe cases like yours where cleanup may be possible but SQL Server can't detect such cases and the alternative of letting user threads go wild inside the server is not exactly wise.

    Documentation related: this paints a clearer picture of what's going on during abort: http://msdn.microsoft.com/en-us/magazine/cc163716.aspx#S9

    Come on Mike, step back a second here - a user sitting at SSMS on a production system, gets bored and presses 'Cancel' and immediately disconnects and interrupts twenty other production customers possibly aborting transactions and so on and this is the 'best' that can be done in the circumstances?

    If I were to teach a class on software design and pose this probem to some tech students I think we can all agree that they would have numerous good ideas about how to address this design problem, no serious developer I know would ever suggest we use Thread.Abort to brutally kill a thread when the users issues a cancel request, if they did they'd get a fail.

    Cap'n

     



    Monday, November 21, 2011 2:28 PM
  • "Come on Mike, step back a second here"

    He he, I could step back many seconds but that doesn't change the abort -> unload thing.

    Now of course, the fact that cancel does an abort is probably questionable but since I'm not familiar enough with SQL I can't say much about it. I dug some SQL docs out of curiosity and it appears that this cancel thing is more like a kill, it really means "stop right now whatever you are doing".

    "they would have numerous good ideas about how to address this design problem"

    If you're talking about how to implement cancellation then there aren't many options. In fact there are only 2 and neither is ideal:

    1. abort
    2. "keep checking the cancel flag"

    "no serious developer I know would ever suggest we useThread.Abort to brutally kill a thread"

    That's true in the normal CLR world because there you can't cleanup the mess that Abort tends to produce. SQL CLR is quite different, not only it can unload the domain without restarting the process but it also can rollback any affected transaction to avoid data corruption.

     

    Monday, November 21, 2011 3:17 PM
    Moderator
  • "Come on Mike, step back a second here"

    He he, I could step back many seconds but that doesn't change the abort -> unload thing.

    Now of course, the fact that cancel does an abort is probably questionable but since I'm not familiar enough with SQL I can't say much about it. I dug some SQL docs out of curiosity and it appears that this cancel thing is more like a kill, it really means "stop right now whatever you are doing".

    "they would have numerous good ideas about how to address this design problem"

    If you're talking about how to implement cancellation then there aren't many options. In fact there are only 2 and neither is ideal:

    1. abort
    2. "keep checking the cancel flag"

    "no serious developer I know would ever suggest we useThread.Abort to brutally kill a thread"

    That's true in the normal CLR world because there you can't cleanup the mess that Abort tends to produce. SQL CLR is quite different, not only it can unload the domain without restarting the process but it also can rollback any affected transaction to avoid data corruption.

     


    Another problem is that we do not have an exhaustive list of exactly when an app domain will be unloaded, I am aware of three:

    1. An out of memory error arises in some thread.
    2. A stack overflow arises in some thread.
    3. Cancel is issued in SSMS and the the targeted thread has one or more syncblock locks held.

    Are there other possibilities? I have no idea, the docs are ambigous about  this and leave room for uncertainty.

    Is a cancel in SSMS actually implemented with SqlCommand.Cancel?

    Can innocent calls to other code, .net framework, sockets etc ever internally get locks, even briefly? if so how can one prevent the possibility of app domain unload when such code is cancelled?

    Do other threads (not the one being cancelled) see the ThreadAbortException or not if an app domain unload takes place?

    I am running two requests from inside SSMS and doing cancels, I get the ocassional app domain unload (I now trap this by handling the domain unload event) and one thread is seeing the thread abort exception BUT is this the target thread or the other thread?

    Am I even right to think that a thread abort on a thread with a lock ALWAYS means that that thread doesn't get the thread abort exceoption or is this unpredictable?

    Why didn't the bother to pass an info object to thread abort to give the developer some idea of which thread is being aborted and what locks it has or some other helpful info?

    This is very very difficult to deal with and I would like to see someone from the SQL/CLR team take some interest in this discussion, I have asked numeorus questions about this and similar issues over the past six/nine months in this forum thread and I have never seen anyone from Microsoft team for this technology take part.

    Once again thanks for taking the time to reply, we may not be agreeing on everything but at least you are recpetive and can see the challenges here. 

    Cap'n

     



    Monday, November 21, 2011 4:54 PM
  • Trying to answer where I can:

    "Are there other possibilities?"

    I can only guess that abort is used in any case that prevents the current request to complete successfully. Cancel from SMSS, T-SQL's KILL, connection timeouts, deadlock detection that results in a process being killed etc.

    "Can innocent calls to other code, .net framework, sockets etc ever internally get locks, even briefly?"

    It's possible. That's why HostProtectionAttribute exists.

    "Do other threads (not the one being cancelled) see the ThreadAbortException or not if an app domain unload takes place?"

    When a normal app domain unload occurs all threads (that have a stack inside the app domain) are aborted by means of ThreadAbortException. But the rude unloads that are sometimes done by SQL Server don't do this. In that case only the critical finalizers are run and then everything is thrown away out of memory. It's almost like Win32's TerminateProcess: handles are closed and the memory pages used by the process are freed.

    "Am I even right to think that a thread abort on a thread with a lock ALWAYS means that that thread doesn't get the thread abort exceoption or is this unpredictable?"

    It could be that SQL waits a small amount of time for the thread to exit the lock before it decided to rude abort/unload.

    "Why didn't the bother to pass an info object to thread abort to give the developer..."

    We've been here before, you know my opinion about cleaning up in such situations.

     

    Monday, November 21, 2011 6:06 PM
    Moderator
  • Trying to answer where I can:

    "Are there other possibilities?"

    I can only guess that abort is used in any case that prevents the current request to complete successfully. Cancel from SMSS, T-SQL's KILL, connection timeouts, deadlock detection that results in a process being killed etc.

    "Can innocent calls to other code, .net framework, sockets etc ever internally get locks, even briefly?"

    It's possible. That's why HostProtectionAttribute exists.

    "Do other threads (not the one being cancelled) see the ThreadAbortException or not if an app domain unload takes place?"

    When a normal app domain unload occurs all threads (that have a stack inside the app domain) are aborted by means of ThreadAbortException. But the rude unloads that are sometimes done by SQL Server don't do this. In that case only the critical finalizers are run and then everything is thrown away out of memory. It's almost like Win32's TerminateProcess: handles are closed and the memory pages used by the process are freed.

    "Am I even right to think that a thread abort on a thread with a lock ALWAYS means that that thread doesn't get the thread abort exceoption or is this unpredictable?"

    It could be that SQL waits a small amount of time for the thread to exit the lock before it decided to rude abort/unload.

    "Why didn't the bother to pass an info object to thread abort to give the developer..."

    We've been here before, you know my opinion about cleaning up in such situations.

     


    "We've been here before, you know my opinion about cleaning up in such situations."

    Welll I wasn't referring to any cleanup, I have trapped the app domain unload event and (tried to) trap the ThreadAbort too. In neither case is any helpful info supplied like the actual reason it was decided to abort the thread and unload the domain, I'm referring to simple diagnstics support.

    The thread abort exception has an option info field that can be set by the Abort(object info) overload - this is always null.

    If I could log this and saw stuiff like:

    "Thread 23 was aborted because it holds 5 locks." or "This thread is being aborted because the AppDomian must be unloaded" etc etc.

    nothing to do with cleanup Mike, just basic common sense.

     

    Cap'n

     

     

     


    Hugh Moran - http://www.morantex.com
    Monday, November 21, 2011 6:30 PM
  • I'm not very familiar with this attribute Mike, sure I've seen it mentioned and discussed but nevery fully understood it, how does this relate to my question?

    I mean if I call MS code and THAT code gets a lock - then we get a thread abort - how does host protection attribute come into play?

    So far as I can see, if MS code does grab locks then the app domain will be unloaded, and we are powerless to handle this, can you explain a little more about what your answer means?

    Thx

    Cap'n

     

    Monday, November 21, 2011 7:33 PM
  • Another problem I'm having is that I have seen several references to host escalation policy, for example "Specifically, you can configure SQL Server to instruct the CLR to take a different action when SQL Server will take one action in an error condition." this statement can be found here for example.

    Yet I have searched all over the web for an example or info about  this and I can't find anything, I have no idea if one really can do this or not, this might be a way to overrride this behaviour and at least let is have an option after assessing any risks, but I can find nothing about configuring  this for SQL Server.

     

    Cap'n

     

    Monday, November 21, 2011 7:35 PM
  • Your original question was if .NET code may use locks. Well, there's no specific information in the documentation that says if method x uses locks but HostProtectionAttribute can be used as an indication.

    Methods/Types that have HostProtectionAttribute Synchronization = true and/or ExternalThreading = true might use locks.

    This attribute doesn't affect what happens during an abort, it's informative only. The SQL host uses it to prevent non-UNSAFE assemblies to access some types/methods. You can use it to gen an idea about what .NET code may use locks.

    Monday, November 21, 2011 8:21 PM
    Moderator