locked
debugging InvalidProgramException resulting from profiling a "function of death" RRS feed

  • Question

  • I'm writing a .NET profiler, running on .NET 4.0.30319 with VS 2010.

    I have successfully profiled 1000's of functions from a variety of applications, but have recently come across a small function-of-death (below) that causes an InvalidProgramException to be thrown sometime shortly after I call SetILFunctionBody with the new IL my profiler creates. The thrown exception doesn't tell me which of the ~40 checks have caused an InvalidProgramException.

    I do not see this exception when I run under the VS debugger, I only see it when run as a console exe.

    I see the thrown exception coming out of the CLR for this function-of-death regardless of other functions that have been previously and successfully profiled in the course of the same run.

    I see the thrown exception coming out of the CLR in both a debug and release build of the profiling code.

    I (think I) am exquisitely careful about checking base/bounds on the memory returned from GetILFunctionBodyAllocator(...)->Alloc(...).  I'm running with _CrtSetDebugFlag with (most) of the flags set, and that memory manager isn't complaining.

    If I do all the work to allocate and make the new function body but not call SetILFunctionBody, I do not get the InvalidProgramException, so I weakly tend to rule out random memory writes in the course of my profiler's work.

    If I call the ->Alloc() function and ask for more (or much more) memory than I need, the InvalidProgramException is still thrown.

    Here's the function minus the top level surrounding class decl:

    private static void SampleFromNServiceBus_01b(System.Collections.Generic.KeyValuePair<String, String> h) { }

    Changing this function to do any of the following still result in the InvalidProgramException being thrown: adding more formals in any order; returning a value; using local variables; using try/catch blocks in a larger function body; adding more "meat" to the body.

    However changing the type of the formal parameter h ..does.. change the observed behavior. Using <int, int> causes the program to reset(?!) and run again in a different thread from the top.  Using other generics from System.Collections.Generic makes the InvalidProgramException go away.

    I'm at a loss on how to proceed to debug this problem further. 

    Robert Henry, New Relic

    Saturday, November 12, 2011 7:54 PM

Answers

  • I have fixed bugs relating to the encoding of argument type signatures, as well as the way I am boxing some value types, and my problems now seem to have resolved themselves.  I'm not changing the signature of the profiled function, but am constructing a new signature for a function called by the profiled code, and that's where my bug was.
    • Proposed as answer by Shaun Wilde Tuesday, January 10, 2012 9:57 PM
    • Marked as answer by Paul Zhou Thursday, January 19, 2012 9:41 AM
    Saturday, January 7, 2012 2:16 AM

All replies

  • Could you show your IL before and after, including any changes to headers and exception blocks/clauses?
    Monday, November 14, 2011 2:08 AM
  • I have a CLR profiler that I am using to instrument ThreadPoolWorkQueue.Enqueue. My instrumentation works fine when I compile and run my app as a 64-bit application, but I get a "InvalidProgramException: Common Language Runtime detected an invalid program." when I run it as a 32-bit app on a 64-bit machine (i.e. WOW). My profiler works fine on 32-bit-on-64-bit (WOW) and I've profiled tons of applications, but it fails for this particular kind of instrumentation.

    According to this thread (http://social.msdn.microsoft.com/Forums/en/netfxtoolsdev/thread/db6e222a-225d-49a7-9479-7500e96548dc), there is no way for me to persist the fully modified IL + metadata to a file, so I can't use PeVerify to uncover what is wrong. I did persist the IL (no metadata) to a file a compared the versions generated by the 32 and 64-bit, and they are all identical except for opcode arguments like string locations, which will differ by program. The exception handlers are also identical between the two versions.

    Please share any tips for debugging this scenario. 

    -Murali

     

    • Merged by Paul Zhou Monday, January 9, 2012 9:03 AM
    Tuesday, January 3, 2012 6:49 PM
  • Shaun:  Sorry for the delay.  Virtual elapsed time for this was 0...

    Here's the pre-profiling blob of bytes, which includes the header:

      {  0x06 ,0x2a }

    Here's the complete blob of bytes resulting from our profiler; then I show this broken into the header; bytecode; SEH; the bytecode are shown with (my computed) overlay of try/catch and instruction groupings.

    {

            /*   0 */  0x0b ,0x30 ,0x29 ,0x00 ,0x43 ,0x00 ,0x00 ,0x00 ,0x0a ,0x00

            /*  10 */ ,0x00 ,0x11 ,0x72 ,0xa8 ,0x03 ,0x00 ,0x70 ,0x20 ,0x00 ,0x00

            /*  20 */ ,0x00 ,0x00 ,0x72 ,0x1c ,0x04 ,0x00 ,0x70 ,0x72 ,0x1e ,0x04

            /*  30 */ ,0x00 ,0x70 ,0x72 ,0x4c ,0x04 ,0x00 ,0x70 ,0x72 ,0x70 ,0x04

            /*  40 */ ,0x00 ,0x70 ,0x72 ,0x86 ,0x04 ,0x00 ,0x70 ,0x14 ,0x02 ,0x28

            /*  50 */ ,0x30 ,0x00 ,0x00 ,0x0a ,0x0b ,0x00 ,0x00 ,0x00 ,0xde ,0x04

            /*  60 */ ,0x0a ,0x00 ,0xfe ,0x1a ,0x00 ,0xde ,0x0a ,0x00 ,0x07 ,0x14

            /*  70 */ ,0x06 ,0x28 ,0x31 ,0x00 ,0x00 ,0x0a ,0xdc ,0x00 ,0x2a ,0x00

            /*  80 */ ,0x01 ,0x1c ,0x00 ,0x00 ,0x00 ,0x00 ,0x2b ,0x00 ,0x05 ,0x30

            /*  90 */ ,0x00 ,0x04 ,0x2b ,0x00 ,0x00 ,0x01 ,0x02 ,0x00 ,0x2b ,0x00

            /* 100 */ ,0x0c ,0x37 ,0x00 ,0x0a ,0x00 ,0x00 ,0x00 ,0x01

    }

    Header:{

            /*   0 */  0x0b ,0x30 ,0x29 ,0x00 ,0x43 ,0x00 ,0x00 ,0x00 ,0x0a ,0x00

            /*  10 */ ,0x00 ,0x11

    }

    Byte Code:
     {
    /* PrintFunctionBody tiny=0 size=67 sehClausecount=2 */
             0x72   /* | ldstr      */,  0xa8 ,0x03 ,0x00 ,0x70
            ,0x20   /* | ldc.i4     */,  0x00 ,0x00 ,0x00 ,0x00
            ,0x72   /* | ldstr      */,  0x1c ,0x04 ,0x00 ,0x70
            ,0x72   /* | ldstr      */,  0x1e ,0x04 ,0x00 ,0x70
            ,0x72   /* | ldstr      */,  0x4c ,0x04 ,0x00 ,0x70
            ,0x72   /* | ldstr      */,  0x70 ,0x04 ,0x00 ,0x70
            ,0x72   /* | ldstr      */,  0x86 ,0x04 ,0x00 ,0x70
            ,0x14   /* | ldnull     */
            ,0x02   /* | ldarg.0    */
            ,0x28   /* | call       */,  0x30 ,0x00 ,0x00 ,0x0a
            ,0x0b   /* | stloc.1    */
    /*   43 .try  0 */
    /*   43 .try  1 */
            ,0x00   /* | nop        */
            ,0x00   /* | nop        */
            ,0x00   /* | nop        */
            ,0xde   /* | leave.s    */,  0x04       /* IL_034 */
    /*   48 end .try  0 */
    /*   48 .catch  0 */
            ,0x0a   /* | stloc.0    */
            ,0x00   /* | nop        */
            ,0x1a   /* | rethrow    */
    /*   52 end .catch  0 */
    /* *   52 IL_034: */
            ,0x00   /* | nop        */
            ,0xde   /* | leave.s    */,  0x0a       /* IL_041 */
    /*   55 end .try  1 */
    /*   55 .finally  1 */
            ,0x00   /* | nop        */
            ,0x07   /* | ldloc.1    */
            ,0x14   /* | ldnull     */
            ,0x06   /* | ldloc.0    */
            ,0x28   /* | call       */,  0x31 ,0x00 ,0x00 ,0x0a
            ,0xdc   /* | endfinally */
    /*   65 end .finally  1 */
    /* *   65 IL_041: */
            ,0x00   /* | nop        */
            ,0x2a   /* | ret        */
    }

    SEH: {

            /*   0 */  0x00 ,0x01 ,0x1c ,0x00 ,0x00 ,0x00 ,0x00 ,0x2b ,0x00 ,0x05

            /*  10 */ ,0x30 ,0x00 ,0x04 ,0x2b ,0x00 ,0x00 ,0x01 ,0x02 ,0x00 ,0x2b

            /*  20 */ ,0x00 ,0x0c ,0x37 ,0x00 ,0x0a ,0x00 ,0x00 ,0x00 ,0x01

    }

     

    Tuesday, January 3, 2012 9:25 PM
  • Whenever I get an InvalidProgramException it is usually because I have really messed up the IL and it only comes to light when the runtime tries to call the modified method.

    Anyways I put the bytes into my parser and this is what I get

    IL_0000 ldstr (70)0003A8
    IL_0005 ldc.i4 00000000
    IL_000A ldstr (70)00041C
    IL_000F ldstr (70)00041E
    IL_0014 ldstr (70)00044C
    IL_0019 ldstr (70)000470
    IL_001E ldstr (70)000486
    IL_0023 ldnull
    IL_0024 ldarg.0
    IL_0025 call (0A)000030
    IL_002A stloc.1
    IL_002B nop
    IL_002C nop
    IL_002D nop
    IL_002E leave IL_0037
    IL_0033 stloc.0
    IL_0034 nop
    IL_0035 rethrow
    IL_0037 nop
    IL_0038 leave IL_0047
    IL_003D nop
    IL_003E ldloc.1
    IL_003F ldnull
    IL_0040 ldloc.0
    IL_0041 call (0A)000031
    IL_0046 endfinally
    IL_0047 nop
    IL_0048 ret

    Section 0: 0 002B 0033 0033 0037 0000 0100002B
    Section 1: 2 002B 003D 003D 0047 0000 01000000

    NOTE: My parser automatically makes everything long (i.e. leave.s => leave and tiny exceptions => fat exceptions)  so that I don't have to worry about short offsets when adding instrumentation. But since it looks like it took your IL and expanded it correctly all the jumps look good. Also I show the computed locations of the IL where the blocks begin and end to make it easier to read and they line up with your computed output.

    As I understand it what you are doing is adding 2 new variables, types unknown, but I can guess this would be an object/string/int (to store the result from the first function) and an exception and both of these are used to call a second function (in the finally block). The stack size you've added also looks big enough, 0x8 => 0x29

    Nothing looks 'wrong' (I am assuming the tokens are correctly generated) except that the try/finally exception handler has a value in the last block - I'd normally expect 00000000 rather than 01000000 (this is only filled in for try/catch or try/filter blocks as I understand it, but again I also believe it is ignored for the other types so probably isn't the problem).

    I can't see anything you are doing that would account for the generics issue you are reporting unless the issue is with the method you are calling (0A)000030/1 - I may try and repeat the issue if I get time.

     

    Wednesday, January 4, 2012 1:51 AM
  • Thank you for your efforts.
    My IL rewriter/profiler works fine for an otherwise identical function "Good" that takes an argument which is a reimplementation[sic] of KeyValuePair, which comes from the same assembly.  Both "Good" and "Bad" functions have identical header+bytes+SEH.  The CLR only craps out when given the "Bad" profiled function whose argument is System.Collections.Generic.KeyValuePair.
      private static void Good(MyKeyValuePair<StringString> h) { }
      private static void Ugly(System.Collections.Generic.KeyValuePair<StringString> h) { }
      private static void Atrocious(System.Collections.Generic.KeyValuePair<intint> h) { }  // causes CLR restarts

    I'm now starting to trace the interface from my profiler to/through IMetaDataEmit so I can do differentials between the "good" and "ugly".
    Wednesday, January 4, 2012 2:15 AM
  • I assume you are leaving the function signature untouched and only changing the body? How are you creating the local variables token, is it the same for each function?
    Wednesday, January 4, 2012 10:30 AM
  • Another data point (and a mini-duoh!):  This problem only seems to appear in my x64 build, and not in my x86 build.  I'll pursue the settings in the VS2010 Configuration Manager shortly.
    Wednesday, January 4, 2012 5:04 PM
  • Interestingly, I am facing a similar problem which I asked in the thread below -- there, my program works fine in 64-bit but fails in 32-on-64 bit (WOW):

    http://social.msdn.microsoft.com/Forums/en-US/netfxtoolsdev/thread/c4e58165-a54a-4be2-b034-d5c9464f9740

    And in that case, I am not even changing the method signature, though there are changes in local variables.

    It is unclear to me why the bitness would affect the validity of the modified code, since AFAIK MSIL instructions are bitness-agnostic. I am guessing something in the local vars signature might use structures that change size based on bitness, but a cursory look at my source code reveals no such thing.  Since such changes don't show in the IL itself but the function metadata, any tips for debugging would be appreciated.


    Murali Vajapeyam
    Wednesday, January 4, 2012 5:13 PM
  • I'm only changing the body; the function signature is unchanged.  The local variables token is the same for each function, and "works" for profiled functions that are structurally identical to the "Ugly" function of mine.
    Thursday, January 5, 2012 2:46 AM
  • Murali:  I'm monitoring this thread, in addition to the one I started here

      http://social.msdn.microsoft.com/Forums/en-PH/netfxtoolsdev/thread/c39c6897-1971-4e0f-8368-395321c3bf6d?prof=required

    My debugging strategy was to tediously search for the one (out of 1000's) profiled function that caused the CLR to topple over.  I then took that function and started throwing things out, until I determined that in my case the only thing that mattered was the type of the single formal parameter.  I still can't drill any deeper because of the opacity of the CLR at this point.  The next thing I'll try is to simplify the code that our profiler injects until the CLR no longer topples over, but I don't think this will inform me of anything I can do for our production purposes.

    Thursday, January 5, 2012 3:40 PM
  • An aha moment!  The System.Collections.Generic.KeyValuePair is a value type (eg, a struct) and not a reference type.  My profiler is assuming that all arguments to a profiled function are reference types, with 32-bit (or 64-bit?) value types working only by happenstance.  I suspect that the KeyValuePair is a 2-word struct, passed by value.  To make matters worse, my profiler arranges to call a function passing this struct to another function in varargs.  2 corner cases colliding, it seems.  I'm reading the ECMA standard now to see how to untangle this.
    Friday, January 6, 2012 10:41 PM
  • I have fixed bugs relating to the encoding of argument type signatures, as well as the way I am boxing some value types, and my problems now seem to have resolved themselves.  I'm not changing the signature of the profiled function, but am constructing a new signature for a function called by the profiled code, and that's where my bug was.
    • Proposed as answer by Shaun Wilde Tuesday, January 10, 2012 9:57 PM
    • Marked as answer by Paul Zhou Thursday, January 19, 2012 9:41 AM
    Saturday, January 7, 2012 2:16 AM
  • Excellent news, I was going to say boxing when I saw your earlier post via email, but I was on the beach :)
    Saturday, January 7, 2012 6:39 AM