none
Post-mortem debugging and AggregateException

    Question

  • Summary: if an exception thrown from async work is going to ultimately go unhandled (and crash the app), I’d like it to stay unhandled, rather than be caught and rethrown as an AggregateException.

    My company is currently shipping a WPF .NET 3.5 application that uses the ThreadPool to move work off the UI thread and keep it responsive. We’re planning to migrate to .NET 4.0, in part to take advantage of the new PFX/TPL features. (Rx is not currently redistributable, so that’s not an option; we also want the bug fixes in WPF 4.0.)

    It seems logical to replace our code that uses the ThreadPool with new code that uses the TPL types (to get the new scheduler, ContinueWith, unified cancellation, etc.). However, the little I’ve seen so far makes me concerned about our ability to continue to effectively use Windows Error Reporting and post-mortem debugging.

    We currently let unexpected exceptions go unhandled; Windows then displays the standard crash dialog and the user can submit the information to Microsoft. We can then log into Winqual and see all the crash dumps, grouped by crash location and sorted by frequency. (Note: the following discussion is speculative, based on how we’ve seen Winqual currently work; I haven’t distributed a .NET 4.0 application to test my hypotheses.)

    It seems that since all exceptions thrown during the running of a Task are batched up and rethrown as an AggregateException, the Winqual bucket will now be based on the location of the call to Task.Wait or Task.Result, instead of the location of the actual crash (and the different exception types will be masked). If the Task is doing anything remotely “interesting”, there could be many different reasons why it could have crashed; having a separate Winqual bucket for each “real” exception (as is the case right now) is much better.

    Perhaps the Winqual team is aware of this, and will be improving the bucketing logic to break AggregateExceptions up by the InnerExceptions they contain?

    However, even if they do that, because the objects that the Task’s method was using will have gone out of scope and could have been GC’d by the time the AggregateException is rethrown, it seems possible that the minidump (or even a full dump!) won’t contain any of the data pertinent to understanding the crash.

    It seems like an impractical suggestion, but I wonder if there could be opt-in behaviour (e.g., a new TaskCreationOptions flag) that would eliminate the try/catch blocks (except, of course, for catch(TaskCanceledException)) around the task-running code. Or would it be possible to write a custom TaskScheduler that somehow lets exceptions go unhandled?

    Or does anyone have any experience with TPL post-mortem debugging and can tell me it’s not as bad as I’m assuming?

    Thanks,
    Bradley

    FWIW, I also posted the gist of this as a comment on Joe Duffy’s blog: http://www.bluebytesoftware.com/blog/CommentView,guid,652962f1-5073-49a4-b233-9ca24b494742.aspx#0bd19a68-5dad-4764-9318-94fc26e122f9

     

    Thursday, December 17, 2009 10:06 PM

Answers

  • Hi Bradley-

    Thanks for the great feedback. Your observations are correct.  As you read in Joe's blog post, we have multiple competing design issues to deal with, and those brought us to the current by-design behavior. 

    We did consider (several times, actually) adding a TaskCreationOption to specifically target these fire-and-forget use cases, but for v1 we didn't do so.  You can approximate this, however.  In your Task's body, you can provide your own try/catch block that calls Environment.FailFast in the catch block... that will trigger the app to crash, WER to be invoked and a dump generated, etc., e.g.

    Task.Factory.StartNew(() =>
    {
        try { ... /* your fire and forget operation here */ }
        catch(Exception exc) { Environment.FailFast("Unhandled exception", exc); }
    });

    If this is something you'll want to commonly do, you can wrap this functionality up into helpers.  For example, you could have your own utility method:

    public static Action FailOnException(this Action original)
    {
        return () => 
        { 
            try { original(); } 
            catch(Exception exc) { Environment.FailFast(“Unhandled exception”, exc); }
        };
    }

    Then instead of writing:

    Task.Factory.StartNew(action);

    you could write:

    Task.Factory.StartNew(action.FailOnException());

    or instead of writing:

    Task.Factory.StartNew(() =>
    {
        … // code here
    });

    you could write:

    Task.Factory.StartNew(ExceptionHelper.FailOnException(() =>
    {
        … // code here
    }));

    You could also write your own extension method for TaskFactory, like the following:

    public static Task FireAndForget(this TaskFactory factory, Action action)
    {
        return factory.StartNew(action.FailOnException())
    }

    which you could then use instead of StartNew, e.g.

    Task.Factory.FireAndForget(() =>
    {
        ... // code here
    });

    In all of those cases, then, the app would crash as soon as the exception went unhandled in the Task, and you would not have to wait for the task to be finalized.  You would also get the bucketing behavior you desire.

    Note, too, that by default in Visual Studio while debugging, the debugger will break in as soon as the exception goes unhandled within the Task; this assumes you haven't disabled "Just My Code", which is on by default.

    I hope that helps.
    Friday, January 08, 2010 5:07 PM
    Owner

All replies

  • It also appears (from experimentation and reading the documentation) that an unhandled exception in a Task launched in a "fire and forget" way will crash the application by rethrowing the exception at an arbitrary point in the future on the finalizer thread.

    Aside from the Winqual bucketing problems mentioned above, this seems like it could create non-deterministic user crash reports: "sometimes when I do X it crashes, but other times when I do X it works but crashes a minute later" (time delay obviously depending on GC pressure). The ContinueWith suggestion at http://social.msdn.microsoft.com/Forums/en-US/parallelextensions/thread/0ef2fc53-0545-4ff5-b8f1-a6fea3c5dedb helps, but in order to crash the application as close as possible to the exception occurring, one would have to marshal the exception to a thread that isn't running a Task, so that it can truly go unhandled and terminate the process.

    Can anyone from the PFX team confirm that this analysis is accurate, and/or that the current behaviour is by design and won't be changed?
    Friday, January 08, 2010 4:13 PM
  • Hi Bradley-

    Thanks for the great feedback. Your observations are correct.  As you read in Joe's blog post, we have multiple competing design issues to deal with, and those brought us to the current by-design behavior. 

    We did consider (several times, actually) adding a TaskCreationOption to specifically target these fire-and-forget use cases, but for v1 we didn't do so.  You can approximate this, however.  In your Task's body, you can provide your own try/catch block that calls Environment.FailFast in the catch block... that will trigger the app to crash, WER to be invoked and a dump generated, etc., e.g.

    Task.Factory.StartNew(() =>
    {
        try { ... /* your fire and forget operation here */ }
        catch(Exception exc) { Environment.FailFast("Unhandled exception", exc); }
    });

    If this is something you'll want to commonly do, you can wrap this functionality up into helpers.  For example, you could have your own utility method:

    public static Action FailOnException(this Action original)
    {
        return () => 
        { 
            try { original(); } 
            catch(Exception exc) { Environment.FailFast(“Unhandled exception”, exc); }
        };
    }

    Then instead of writing:

    Task.Factory.StartNew(action);

    you could write:

    Task.Factory.StartNew(action.FailOnException());

    or instead of writing:

    Task.Factory.StartNew(() =>
    {
        … // code here
    });

    you could write:

    Task.Factory.StartNew(ExceptionHelper.FailOnException(() =>
    {
        … // code here
    }));

    You could also write your own extension method for TaskFactory, like the following:

    public static Task FireAndForget(this TaskFactory factory, Action action)
    {
        return factory.StartNew(action.FailOnException())
    }

    which you could then use instead of StartNew, e.g.

    Task.Factory.FireAndForget(() =>
    {
        ... // code here
    });

    In all of those cases, then, the app would crash as soon as the exception went unhandled in the Task, and you would not have to wait for the task to be finalized.  You would also get the bucketing behavior you desire.

    Note, too, that by default in Visual Studio while debugging, the debugger will break in as soon as the exception goes unhandled within the Task; this assumes you haven't disabled "Just My Code", which is on by default.

    I hope that helps.
    Friday, January 08, 2010 5:07 PM
    Owner
  • Thanks for the suggestions--I had forgotten about Environment.FailFast when I mentioned marshalling the exception to another thread; that API is a great way to terminate the application.
    Friday, January 08, 2010 9:45 PM