locked
Can .Net exe work without having MSIL inside? RRS feed

  • Question

  • As I know Ngen can make native code out of MSIL and this code works being loaded by JIT from GAC *.ni.exe.

    The first version of Ngen did not keep the MSIL inside exe.

    Also this article shows that the pair of exe and ni.exe can work even if MSIL is removed
    http://www.woodmann.com/forum/archive/index.php/t-11459.html

    So seemingly it is possible to make the compilation tool that will make native exe and remove MSIL at all.
    This could solve the problem with easy reverse engineering of .Net assemblies.

    Why Microsoft has not such tool yet?

    Why ".Net Native" links inside the parts of framework instead of simply using logic of Ngen and removing MSIL?
    Sunday, July 6, 2014 3:12 AM

Answers

  • "In general, setting things in stone tends to be the enemy of progress."

     It is not in stone it is for separate version of framework.

     So results are like this:

     1) MSIL gives ability to work without defining and fixing (making constant ) binary interface

     2) To run .Net program without MSIL it is necessary to fix binary interface for separate version of framework

     3) There is no .Net specific technical problems that prevent Microsoft from fixing binary interface for separate version of framework

     4) There is no white paper that declares whether Microsoft changes binary interface within one version of framework or not

     5) As a result companies waste money on obfuscators, damage perhaps is already comparable to super typhoon and nobody anymore sure that source code is safe ( except users of that linkers that link framework inside program and remove MSIL )

     Mike, at which points I am wrong now?

    • Marked as answer by Dear me Saturday, July 12, 2014 1:04 AM
    Thursday, July 10, 2014 11:39 PM

All replies

  • "Can .Net exe work without having MSIL inside?"

    Not really, there's at least one case where the MSIL is required: generics. NGEN can't anticipate how a generic type/method will be used at runtime and therefore it can't produce code for generic methods in advance.

    "This could solve the problem with easy reverse engineering of .Net assemblies."

    Not really because even if you could remove the MSIL from a .ni.exe image you still have to deliver the original .exe image and not the NGEN compiled one. NGEN images are machine specific, they may not work properly on machines other than the one which produced the image.

    "Why ".Net Native" links inside the parts of framework instead of simply using logic of Ngen and removing MSIL?"

    Because Ngen wasn't simply designed to do this. As mentioned above the MSIL might still be needed in a NGENed image. .Net Native is trying to avoid all the MSIL at the cost of eliminating or complicating some features (reflection in particular).

    Sunday, July 6, 2014 6:37 AM
  • - "Can .Net exe work without having MSIL inside?"

    - "Not really, there's at least one case where the MSIL is required: generics. NGEN can't anticipate how a generic type/method will be used at runtime and therefore it can't produce code for generic methods in advance."

    Yes, I see, Ngen 1 did not keep MSIL in exe, generics appeared in C# 2 and NGen 2 now keeps MSIL.

    They now try to solve this in .Net Native. it was solved somehow for C++, as I understand.

    ------

    - "This could solve the problem with easy reverse engineering of .Net assemblies."

    - "Not really because even if you could remove the MSIL from a .ni.exe image you still have to deliver the original .exe image and not the NGEN compiled one."

    This is strange constrains specific to NGen. This is not specific for .Net.

    - "NGEN images are machine specific, they may not work properly on machines other than the one which produced the image."

     Why? Can you give samples of such incompatibility?

    --------

    - "Why ".Net Native" links inside the parts of framework instead of simply using logic of Ngen and removing MSIL?"

    -"Because Ngen wasn't simply designed to do this. As mentioned above the MSIL might still be needed in a NGENed image. .Net Native is trying to avoid all the MSIL at the cost of eliminating or complicating some features (reflection in particular)."

    Yes, but why they say "framework code will be compiled into the application"?

    Why cannot they keep one set of framework binaries used by many applications as it was before?

    Sunday, July 6, 2014 7:49 AM
  • "They now try to solve this in .Net Native. it was solved somehow for C++, as I understand."

    For C++ there wasn't anything to solve, C++ doesn't support any kind of reflection. All the code (including template code) must be known and compiled and compile time. This leads to a C++ specific problem - if you're using templates then you have to put all the template code in the header, such code can't be placed in a dll.

    "Why? Can you give samples of such incompatibility?"

    There are 2 possible problems with using NGEN images on a different machine:

    • The generated code may be CPU specific. It may use SSE2 instructions and then the code won't work on a CPU that doesn't support SSE2. That's probably less likely to happen today because all current CPUs support SSE2 but this problem will resurface again when SIMD support is added to .NET, then NGEN could use AVX2 but this is a recent addition and not everyone has a AVX2 capable CPU.
    • The generated code may be .NET Framework version specific. MSIL refers to fields by using metadata tokens and names but assembly code refers to fields by using offsets. These offsets may change during a framework update and then the NGEN mage becomes invalid. NGEN solves this by recompiling assemblies when needed. Of course, to do this NGEN requires the original MSIL assembly.

    "Why cannot they keep one set of framework binaries used by many applications as it was before?"

    They can but they haven't done so yet. It's easier to just put everything in a executable and avoid depending on other dlls which may be updated. This basically avoids problem 2 described above.

    The drawback of doing this is that the executable is quite large but for WinStore apps it's not unreasonably large. One thing that's specific to WinStore apps is that they rely on the native XAML framework, because it is native it doesn't contribute to the executable size. In contrast, .NET desktop apps use WinForms or WPF and both are large managed libraries. Probably an executable that embeds the WPF libraries would exceed 10 megabytes in size, that's a bit excessive.


    • Edited by Mike Danes Sunday, July 6, 2014 9:56 AM
    Sunday, July 6, 2014 9:56 AM
    • The generated code may be .NET Framework version specific. MSIL refers to fields by using metadata tokens and names but assembly code refers to fields by using offsets. These offsets may change during a framework update and then the NGEN mage becomes invalid. NGEN solves this by recompiling assemblies when needed. Of course, to do this NGEN requires the original MSIL assembly.
      Why there is not such problem in Win32 API? Why assembly code needs to use these fixed offsets?
    Sunday, July 6, 2014 10:44 AM
  • "Why there is not such problem in Win32 API?"

    There is but you probably don't realize it because the Win32 API authors have been careful to hide/avoid it. This was done by following 2 rules:

    • Fields are never removed from nor inserted into the middle of a struct. The only change one can make to a Win32 struct is to add fields at the end.
    • all Win32 structs that are likely to change (have fields added to them) in the future have a 'size' fields which the user of the struct must fill with the sizeof(struct). This allows a called Win32 function to figure what "version" of the struct the caller is using.

    Such rules aren't suitable for a object oriented framework like .NET, they work only in C style API such as Win32.

    "Why assembly code needs to use these fixed offsets?"

    What else could it use? Assembly code is all about CPU instructions and memory addresses, things like metadata tokens are meaningless in assembly code. This is also what makes assembly code difficult to reverse engineer.

    Sunday, July 6, 2014 11:16 AM
  • "Why there is not such problem in Win32 API?"

    There is but you probably don't realize it because the Win32 API authors have been careful to hide/avoid it. This was done by following 2 rules:

    • Fields are never removed from nor inserted into the middle of a struct. The only change one can make to a Win32 struct is to add fields at the end.
    • all Win32 structs that are likely to change (have fields added to them) in the future have a 'size' fields which the user of the struct must fill with the sizeof(struct). This allows a called Win32 function to figure what "version" of the struct the caller is using.

    Such rules aren't suitable for a object oriented framework like .NET, they work only in C style API such as Win32.

     Thanks, Mike, things become more clear.

     Do you know, how often Microsoft changes the structs within one framework version?

     Could they simply make agreement that they wait for next version to change the struct?

     And, of course, why did they seal the structs?

     As I understand from C++, if inheritance is used then the new field will never get in the middle of the old fields.

    Why that guys making the new object oriented language decided to use "C style" non-inheritable structs?

    Sunday, July 6, 2014 11:04 PM
  • "Do you know, how often Microsoft changes the structs within one framework version?"

    Probably not that often but it can happen. Note that this problem applies to both structs and classes and classes are far more likely to change than structs.

    "Could they simply make agreement that they wait for next version to change the struct?"

    What if a change needs to made to fix a security issue?

    "And, of course, why did they seal the structs?"

    There are various reasons for that (in particular a struct doesn't have a method table and because of that you wouldn't be able to use virtual methods) but anyway this wouldn't avoid problems with field additions. It's not like someone would be willing to create a new structs inheriting the old one just to add a field. This isn't COM where every time you need to make a change you have to create a new interface.

    Monday, July 7, 2014 5:47 AM
  •  "What if a change needs to made to fix a security issue?"

     Easy reversing is also security issue, monstrous one, I would say.

     "There are various reasons for that (in particular a struct doesn't have a method table and because of that you wouldn't be able to use virtual methods) "

     Yes, of course, if it is not object oriented then it has no method table.

    "It's not like someone would be willing to create a new structs inheriting the old one just to add a field. This isn't COM where every time you need to make a change you have to create a new interface."

     New interface is necessary to keep old intact. It is not about COM, it is basic rule of software development and even business in general. Interface is a contract. What you are saying me now is that

    all industry should walk open source just because Microsoft does not respect interfaces anymore.

    Monday, July 7, 2014 8:53 AM
  • "Easy reversing is also security issue, monstrous one, I would say."

    That's an exaggeration. I've yet to see a real security issue caused by easy reversing.

    "New interface is necessary to keep old intact. ..."

    Yes but unlike COM .NET is not restricted to interfaces, it has classes and structs. You can't really create something like List2<T> just because you want to add a new field to the existing List<T>.

    Monday, July 7, 2014 9:28 AM
  • - "Easy reversing is also security issue, monstrous one, I would say."

    - "That's an exaggeration. I've yet to see a real security issue caused by easy reversing."

     Sources are easy to steal. It was impossible for native C++. And programs are more easy to hack then native C++. The whole new business of obfuscation and code encryption appeared that in fact does not help, because obfuscation cannot be reliable even in theory. Now nobody can be sure that his sources are protected. And you tell me that

    1) this is nothing,

    2) this happens simply because Microsoft

     a) did not make pre-compilation for generics yet ( in process for .Net Native )

     b) simply does not want to guarantee that interface of framework will stay intact within one version ( .Net Native is going to link parts of framework inside because of this  ).

     Did I understand you right?

     I do not understand this:

    "Yes but unlike COM .NET is not restricted to interfaces, it has classes and structs. You can't really create something like List2<T> just because you want to add a new field to the existing List<T>."

    Native C++ has templates and does not require to keep sources with binary. I do not understand how .Net C# is different in this aspect especially taking into account that .Net Native is going to support generics somehow and make totally native images as I understood.

    Monday, July 7, 2014 11:23 PM
  • "Sources are easy to steal."

    That's not a security issue as in "code bug that allows arbitrary code to be executed". Besides, the fact MSIL can be easily decompiled isn't quite the same thing as having the source code. Source code has comments. Source code has local variable names. Source code has source control history. Source code has associated tests. Decompiled source code has none of this. But decompiled source code has bugs because existing decompilers tend to have bugs.

    "It was impossible for native C++."

    That's not quite true. Native code decompilers do exist but they aren't as good as MSIL decompilers. And the funny thing is that the reason why native decompilers are worse isn't the actual code but the lack of type information, that is, metadata.

    "because obfuscation cannot be reliable even in theory"

    I've never used obfuscation but I've seen quite a few 3rd party libraries that are using it so maybe it doesn't work in theory but in practice it seems to work well enough. The only problem is that obfuscation may interfere with reflection.

    "did not make pre-compilation for generics yet"

    compilation for generics is done but not in all cases. If your application contains a type Foo and you use List<Foo> in code then the code that is necessary for List<Foo> can be precompiled. The actual problem is, again, reflection. You can do something like Activator.CreateInstance(typeof(List<>).MakeGenericType(Type.GetType("Foo"))) and in this case the compiler has no way to guess that List<Foo> will be needed. .NET Native avoids this problem by requiring you to provide hints to tell the compiler which types/members need to be available at runtime for reflection purposes.

    "simply does not want to guarantee that interface of framework will stay intact within one version"

    The interface is unchanged as far as the framework is concerned. It's not like they'll do anything like removing the List<T>.Add method or changing its meaning. The interface may change at the binary level but at the end of the day .NET, unlike COM, doesn't define a binary interface.

    "Native C++ has templates and does not require to keep sources with binary."

    But if you make a library and you use templates then you have to put all your template code in the headers. Anyway, as I said previously, MSIL isn't source code.

    "I do not understand how .Net C# is different in this aspect especially taking into account that .Net Native is going to support generics somehow and make totally native images as I understood."

    I said this in my first post, .NET Native is doing this by restricting certain functionality.

    At the end of the day it's simply a matter of trade-offs. Certain programming features have a certain cost. You want reflection - you pay a cost. You want GC (GC also needs some bits of metadata) - you pay a cost. You want runtime code generation - you pay a cost. If you don't need any of these features and you think that your code is so special that it needs better protection then nobody is forcing you to use .NET, you can use C++.

    Tuesday, July 8, 2014 5:36 AM
  • Thank you, Mike, more details become clear,

    " compilation for generics is done but not in all cases. If your application contains a type Foo and you use List<Foo> in code then the code that is necessary for List<Foo> can be precompiled. The actual problem is, again, reflection. You can do something like

    Activator.CreateInstance(typeof(List<>).MakeGenericType(Type.GetType("Foo")))

    and in this case the compiler has no way to guess that List<Foo> will be needed. .NET Native avoids this problem by requiring you to provide hints to tell the compiler which types/members need to be available at runtime for reflection purposes."

    This means in most cases the tool similar to Ngen would work.

    So you write that the only real reason to keep the MSIL is the hypothetical danger that binary interface of the framework can be changed during update.

    Do you know any Microsoft document that confirms that this can happen?

    Wednesday, July 9, 2014 8:10 AM
  • "So you write that the only real reason to keep the MSIL is the hypothetical danger that binary interface of the framework can be changed during update."

    For an exe that's probably the main reason. For a library (dll) there are other problems. For example:

    • MSIL must be available in mscorlib for List<T> not because of reflection but because anyone can reference mscorlib and create a List<T> with a type that is not known to mscorlib.
    • MSIL is also needed for some compiler optimizations. For example BitArray is not generic but its Count property getter is small enough that it's a good candidate for inlining.
    • There are case where the NGEN image for a dll cannot be loaded, I don't remember the exact details but it's related to using multiple AppDomains. When this happens the original MSIL dll must be available (or possible the NGEN image is loaded but the native code it contains is not used, only the MSIL is used).

    "Do you know any Microsoft document that confirms that this can happen?"

    It doesn't work like that, it's the other way around. There's no documentation about a NGEN binary interface because such an interface does not exist and in turn that means that such things can change at any time. It's not only about adding fields, a lot of other things could change that would make a NGEN image invalid:

    • calling convention
    • method table format
    • field order (unless you use StructLayout.Explicit/Sequential the field layout is unspecified, CLR reorders fields as it pleases)
    • whatever mechanisms are used to bind a NGEN image to other assemblies and to the runtime

    Wednesday, July 9, 2014 6:42 PM
  • "MSIL must be available in mscorlib for List<T> not because of reflection but because anyone can reference mscorlib and create a List<T> with a type that is not known to mscorlib."

    This is definitely open-source. STL is also open-source, right? It is the same. But it does not mean we need to keep everything open-source.

    "It's not only about adding fields, a lot of other things could change that would make a NGEN image invalid:

    • calling convention
    • method table format
    • field order (unless you use StructLayout.Explicit/Sequential the field layout is unspecified, CLR reorders fields as it pleases)
    • whatever mechanisms are used to bind a NGEN image to other assemblies and to the runtime"

    What are the technical reasons that can prevent Microsoft from keeping this things fixed for separate versions of framework?

    Where is white paper that declares that Microsoft simply refuses to do this?

    For example, in previous, sane universe Microsoft also could change calling convention for Win32 API, but it never did it, right?


    Thursday, July 10, 2014 2:18 AM
  • "This is definitely open-source. STL is also open-source, right? It is the same. But it does not mean we need to keep everything open-source."

    Again, MSIL != source code. And VC++'s STL implementation isn't open-source unless you're willing to bend the definition of "open-source".

    "Where is white paper that declares that Microsoft simply refuses to do this?"

    Again, there is no such documentation and this lack of documentation means that things can change. Maybe they will, maybe they won't, what matters is that they can change. Could things be set in stone? Likely, but currently they are not.

    Speaking of Win32, that can be changed because it's a binary interface, it uses a specific calling convention (_stdcall). Unfortunately this isn't without issues, for example all the x86 standard calling convention demand that floating point values be returned via the x86 FPU stack. In a world where most CPUs are supporting SSE and even AVX this is a bit of problem, it affects performance and it complicates the compiler. In general, setting things in stone tends to be the enemy of progress.

    Thursday, July 10, 2014 4:53 AM
  • "In general, setting things in stone tends to be the enemy of progress."

     It is not in stone it is for separate version of framework.

     So results are like this:

     1) MSIL gives ability to work without defining and fixing (making constant ) binary interface

     2) To run .Net program without MSIL it is necessary to fix binary interface for separate version of framework

     3) There is no .Net specific technical problems that prevent Microsoft from fixing binary interface for separate version of framework

     4) There is no white paper that declares whether Microsoft changes binary interface within one version of framework or not

     5) As a result companies waste money on obfuscators, damage perhaps is already comparable to super typhoon and nobody anymore sure that source code is safe ( except users of that linkers that link framework inside program and remove MSIL )

     Mike, at which points I am wrong now?

    • Marked as answer by Dear me Saturday, July 12, 2014 1:04 AM
    Thursday, July 10, 2014 11:39 PM
  • "Mike, at which points I am wrong now?"

    Well, I'd say that you're overall wrong about the importance of MSIL being available in the image and you're making exaggerate claims of damage (or you're minimizing the damage done by typhoons and that's quite insensitive to those affected by them). But I suppose this is a matter of personal opinion...

    Friday, July 11, 2014 6:59 AM
  • "Mike, at which points I am wrong now?"

    Well, I'd say that you're overall wrong about the importance of MSIL being available in the image and you're making exaggerate claims of damage (or you're minimizing the damage done by typhoons and that's quite insensitive to those affected by them). But I suppose this is a matter of personal opinion...

     Then I mark that 5 points as the answer. Thank you for your patience and technical details provided here.

    P. S. Regarding that offsets I think in is possible to improve JIT to keep offsets and sizes in tables outside the main binary and update them accordingly. That could make structs somewhat object oriented on the binary level and thus the old binaries could work even with updated binary interface. As MS "does not care" as you say then it is not important, of course.

    Saturday, July 12, 2014 1:20 AM
  • "Regarding that offsets I think in is possible to improve JIT to keep offsets and sizes in tables ..."

    They've done something like this for Windows Phone, they produce "MDIL" images on their servers and install those on the phones. MDIL is basically native code but where the offsets are replaced with tokens. At runtime a JIT linker simply replaces those tokens with actual offsets.

    I don't know if the MSIL is kept in the image but it doesn't really matter, metadata is. And metadata is a treasure for reverse engineering. In general, any additional bit of information beyond what a native image normally has can be useful for reverse engineering.

    Saturday, July 12, 2014 5:38 AM
  • "They've done something like this for Windows Phone, they produce "MDIL" images on their servers and install those on the phones. MDIL is basically native code but where the offsets are replaced with tokens. At runtime a JIT linker simply replaces those tokens with actual offsets."

    Goog news, then it is a big question why .Net Native does not do exactly the same.

    I think MDIL should work more quickly then obfuscated virtualized MSIL.

    "I don't know if the MSIL is kept in the image but it doesn't really matter, metadata is. And metadata is a treasure for reverse engineering. In general, any additional bit of information beyond what a native image normally has can be useful for reverse engineering."

    People at .Net native are not obsessed with metadata, they say that they keep only that parts that are really used by reflection in current assembly as I understood.

    Saturday, July 12, 2014 6:30 AM
  • "then it is a big question why .Net Native does not do exactly the same."

    Nope, it has no need to do this since it drags all the necessary framework code into the final executable. Changes to the binary interface of the framework are irrelevant to .Net native.

    "I think MDIL should work more quickly then obfuscated virtualized MSIL."

    I don't know what "obfuscated virtualized MSIL" means. The primary obfuscation technique is to replace the names used in metadata with bogus names such as "a_0". That doesn't affect performance. I saw some which do things like encrypting string literals, that probably hurts performance and it's kind of useless. Anyway, usually you can select what to obfuscate.

    "People at .Net native are not obsessed with metadata, they say that they keep only that parts that are really used by reflection in current assembly as I understood."

    Yep, but some metadata like information still persists for various reasons. The final .Net Native image has to contain the "method tables" that are otherwise generated at runtime, from metadata, by the normal CLR. The "method table" name is a misnomer, they contain some additional information beyond methods. They contain the the base type (for casting), the size of the type and the position of fields of reference type (for GC) and maybe some other info that's necessary at runtime no matter what. That's quite useful for reverse engineering.

    Saturday, July 12, 2014 7:34 AM
  • -"then it is a big question why .Net Native does not do exactly the same."

    -"Nope, it has no need to do this since it drags all the necessary framework code into the final executable."

    "Ma foi, je me bats parce que je me bats, répondit Porthos en rougissant." (Alexandre Dumas)

    "I don't know what "obfuscated virtualized MSIL" means."

    There are a lot of things like "Control flow obfuscation", "Code virtualization by custom virtual machine", etc. Dozens of programs. The whole new market. Millions of dollars. Not guaranteed result. Selling indulgences.

    "The final .Net Native image has to contain the "method tables" that are otherwise generated at runtime, from metadata, by the normal CLR. The "method table" name is a misnomer, they contain some additional information beyond methods. They contain the the base type (for casting), the size of the type and the position of fields of reference type (for GC) and maybe some other info that's necessary at runtime no matter what. That's quite useful for reverse engineering."

    Thanks, this is important. So .Net is more vulnerable because of GC. This is fundamental thing.

    Saturday, July 12, 2014 10:04 AM
  • "Not guaranteed result. Selling indulgences."

    So be smart, don't buy or use snake oil.

    Saturday, July 12, 2014 10:33 AM