none
How can I enumerate private nested types?

    Question

  • TypeSymbol.GetTypeMembers() does not seem to return private nested types. I expected it to return all nested types regardless of visibility. Is this a bug or by design? If the latter, than how can I enumerate private nested types.

    Thanks,
    Chris

            [TestMethod]
            public void PrivateNestedStructs() {
                var user = Environment.GetEnvironmentVariable("USERPROFILE") + "/";
                var fooPath = user + @"Documents\foo.dll";
    
                File.Delete(fooPath);
    
                using (var file = new FileStream(fooPath, FileMode.CreateNew))
                    Compilation.Create("foo.dll",
                        references: new[] { 
                            MetadataReference.Create(typeof(object).Assembly.Location),
                        },
                        syntaxTrees: new[] { 
                            SyntaxTree.ParseCompilationUnit(
                                "public class Foo { " + 
                                    "struct Struct { } " +
                                    "private struct PrivateStruct { } " +
                                    "protected struct ProtectedStruct { } " +
                                    "protected internal struct ProtectedInternalStruct { } " +
                                    "internal struct InternalStruct { } " +
                                    "public struct PublicStruct { } " +
    
                                    "class Class { } " +
                                    "private class PrivateClass { } " +
                                    "protected class ProtectedClass { } " +
                                    "protected internal class ProtectedInternalClass { } " +
                                    "internal class InternalClass { } " +
                                    "public class PublicClass { } " +
                                "}"
                            ) 
                        }
                    ).EmitMetadataOnly(file);
    
                var compilation = Compilation.Create("Foo", references: new[] { 
                    MetadataReference.Create(typeof(object).Assembly.Location),
                    MetadataReference.Create(fooPath),
                });
    
                var foo = compilation.GetTypeByMetadataName("Foo");
                var nestedTypes = foo.GetTypeMembers().ToArray();
                Assert.AreEqual(12, nestedTypes.Count());
            }
    


    • Edited by kingces95 Tuesday, December 06, 2011 8:21 PM
    Tuesday, December 06, 2011 8:17 PM

Answers

  • Hi Chris - I believe this is by design. Roslyn does not import symbols for private members from metadata - I believe this is done so as not to bloat memory usage.

    This only impacts symbols that come from metadata - so if the type 'Foo' were defined in source then you would be able to get symbols for its nested private types. In other words, if you were to try this on the first compilation in the code you posted above where type 'Foo' is defined in source then you would be able to get the symbol for the struct 'Private'.


    Shyam Namboodiripad | Software Development Engineer in Test | Roslyn Compilers Team
    Tuesday, December 06, 2011 8:45 PM
    Owner

All replies

  • Hi Chris - I believe this is by design. Roslyn does not import symbols for private members from metadata - I believe this is done so as not to bloat memory usage.

    This only impacts symbols that come from metadata - so if the type 'Foo' were defined in source then you would be able to get symbols for its nested private types. In other words, if you were to try this on the first compilation in the code you posted above where type 'Foo' is defined in source then you would be able to get the symbol for the struct 'Private'.


    Shyam Namboodiripad | Software Development Engineer in Test | Roslyn Compilers Team
    Tuesday, December 06, 2011 8:45 PM
    Owner
  • Shyam, thanks. So no private members of any type (Type, Field, Property, Method, etc) are currently returned if reflecting over metadata in order to reduce the memory footprint? The ablity to simply enumerate private symbols won't bloat memory unless they cannot be filtered out, no?

    IMHO, Roslyn should allow users to filter the Symbols they want to get back. Currently (AFAICT) there is no memory efficient way to get a subset of symbols (e.g. any field or property named "Foo" or all public symbols or a public method "Bar" with parameter type Baz). Instead users must load and cache all symbols via GetMembers (or all Types via GetTypeMembers). For example, Roslyn returns internal symbols to support (I assume) friend assemblies. Friend assemblies are not a common scenario. So for most code those internal symbols are bloat (possibly even more so than private symbols would be).

    One great way to reduce Roslyn's memory footprint would be to delay the conversion of the UTF8 strings in the metadata string heap to System.Strings. To illustrate, here is a little program that reflects over mscorlib. Interleaved with the code are the top 3 types by total memory allocation as seen by SOS !DumpHeap -stat.

    First a compilation is created that references mscorlib and a dump is taken to get a baseline. 

    Next System.Object is resolved via GetTypeByMetadataName. That causes 400 strings (20k) to be allocated -- the biggest total allocation by type. If, however, the string passed to GetTypeByMetadataName were instead converted to UTF8 and that was used as the key then none of those strings would need to be allocated.

    Next every symbol for every type is loaded. This allocates 35k strings (1.5M) -- again the biggest total allocation by type. And again I'd assume most of these string are from the metadata string heap. At this point program has yet to request a string from the metadata string heap so few, if any, of those strings need to be allocated.

    Finally, everything is collected at the end. Oddly there's still a lot of stuff... Which makes me wonder if I'm doing this right! :) Maybe you could try to repro.

            [TestMethod]
            public void MscorlibStrings() {
    
                var compilation = Compilation.Create("Foo", references: new[] { 
                    MetadataReference.Create(typeof(object).Assembly.Location),
                });
    
                GC.Collect();
    
    682b6eb4      765       113764 System.Object[]
    683002fc     2299       137940 System.Reflection.RuntimeMethodInfo
    682ffc38     3741       306756 System.String
    
                var obj = compilation.GetTypeByMetadataName("System.Object");
                GC.Collect();
    
    682b6eb4     1082       123348 System.Object[]
    683002fc     2299       137940 System.Reflection.RuntimeMethodInfo
    682ffc38     4184       328076 System.String
    
                var all = Types(compilation).SelectMany(o => o.GetMembers()).ToArray();
                GC.Collect();
    
    682b6eb4    32618       917092 System.Object[]
    316f1944    20875      1336000 Roslyn.Compilers.CSharp.Metadata.PE.PEMethodSymbol
    682ffc38    39046      1791240 System.String
    
                obj = null;
                compilation = null;
                GC.Collect();
    
    682b6eb4    32624       916836 System.Object[]
    316f1944    20875      1336000 Roslyn.Compilers.CSharp.Metadata.PE.PEMethodSymbol
    682ffc38    39046      1791392 System.String
            }
    
            private IEnumerable Types(Compilation compilation) {
                    var stack = new Stack();
                    stack.Push(compilation.GlobalNamespace);
                    while (stack.Count > 0) {
                        var symbol = stack.Pop();
                        var type = symbol as TypeSymbol;
                        var ns = symbol as NamespaceSymbol;
    
                        if (ns != null) {
                            foreach (var o in ns.GetNamespaceMembers())
                                stack.Push(o);
    
                            foreach (var o in ns.GetTypeMembers())
                                stack.Push(o);
                        }
    
                        if (type != null) {
                            yield return type;
    
                            foreach (var o in type.GetTypeMembers())
                                stack.Push(o);
                        }
                    }
            }
    

    Thanks,
    Chris


    • Edited by kingces95 Wednesday, December 07, 2011 1:37 AM
    Wednesday, December 07, 2011 1:36 AM
  • I believe the decision to not import private members was made because GetTypeMembers, the VB and C# compilers as well as the VB and C# IDE services all rely on the same internal APIs to import symbols from metadata. Even though these APIs are 'lazy', always importing private members could bloat memory usage for VS as well as for the compilers.

    That said, I agree that it would be useful to have an implementation where the level of filtering can be configured (e.g. filter inacessible members by default but support some optional flags on the public API that would allow importing of inaccessible members or vice-versa). As you say, it would also be useful to have more fine-grained filtering whereby you can import even fewer members if you wanted to.

    Do you want to log this suggestion on Connect? :)

    I haven't had a chance to try the code you posted above under SOS yet. Your findings look interesting. I will try this out soon and let you know whether I can repro what you are seeing. Thanks much for the feedback!!


    Shyam Namboodiripad | Software Development Engineer in Test | Roslyn Compilers Team
    Friday, December 09, 2011 2:30 AM
    Owner
  • Shyam, thanks. I opened a connect bug. Hm, yes, I understand what your saying: Roslyn is not providing private symbols because that might bloat scenarios in which they are not needed. What I'm pointing out is that rationale puts the cart before the horse. Roslyn is optimizing before it's completed implementing its features. Functionality should not be dropped in the name of optimization; optimizations should be dropped if they cannot support the feature set.

    There is a lot of low hanging fruit in Roslyn's existing memory usage which would make up for the functionality being dropped. As the program above illustrates, converting strings the user passes into Roslyn to UTF8 is a simple fix that would significantly improve working set. 

    The UTF8 string fix is just one instance of a broader optimization. Theoretically, every XXXSymbol could contain only two fields: (1) a single pointer into the PE file at the metadata row it's representing and (2) a reference to the Compilation for resolving assemblies. For example, a NamedSymbolType would point at it's TypeDef. When its asked for its name only then would the string be pulled from the string heap and converted from UTF8 to a CLR string; When it's asked for it's visibility only then would its visibility be deserialized into a CLR enum. Furthermore because all XXXSymbols are immutable all those cached values could be serialized into a file. Then the next time Roslyn loads an assembly for reflection that file could be memory mapped and -- Presto! -- NGEN for Roslyn. So, theoretically at least, many of the objects Roslyn is now creating to support SymbolXXX are unnecessary.

    The NGEN optimization may seem a bit extreme until you consider that Visual Studio performance is the #1 replied to thread in the Visual Studio Editor forum. And my own sad experience tells me that users dont't give a hoot if code in managed if its slower. And even less of a hoot if the working set increases. So, IMVHO, performance, specifically working set, will end up being the largest challenge to overcome before Roslyn's ultimate acceptance into Visual Studio. So every trick to reduce working set should be pursued! But only after the functionality is complete.

    Thanks,
    Chris






    • Edited by kingces95 Wednesday, December 21, 2011 12:45 AM
    Friday, December 09, 2011 8:46 PM
  • Anthony, 

    Serialization is the scenario for which private member reflection is most compelling. Generating serialization code requires enumerating private symbols. For example, I often code up little type systems and those types need to be serialized. For example, I'll map C# types to another type system (JASON, XML, whatever) by sprinkling custom attributes on the types, methods, properties, and fields. Things like [JasonTypeAttribute] or [EntityField(Serialization = true)]. To generate the serialization logic I reflect over the assembly and load the C# types into my loader. Then I write a serializer in terms of my types. 

    This works fine but it is really a hack to have to load the assembly to reflect over it. That means I have to copy all referenced assemblies into a common directory so the loader to load the C# types so I can load them into my types. The way it should be done is by reflecting over the source and creating my type system in terms of Roslyn types. After all, Roslyn has already parsed the code and has an AST sitting in memory.

    Or if I want to generate the serialization code at build time then I'm copying referenced assemblies all over the place so that I can load everything using the normal loader context. That can take up substantial time during the build. And if anyone is wondering, I don't like the ReflectionOnly loader context because I can't use typeof(Foo) to do type comparison because that loads the type in the runtime context and makes thing very confusing. (IMHO the ReflectionOnly context was a hack to unblock scenarios at the time. Furthermore the ability to now attach properties to Type just adds entropy to the identity chaos because now I gotta worry about loader context AND reflection context when I get a System.Type.) I'd much rather create a Roslyn Compilation and bind directly to the assemblies by file path and just use that to do the reflection.

    At runtime I also need to use my type system and I don't want to have to re-write the entire thing using System.Reflection. I only want to bind to reflection for latebound operations and other operations that can only be done at runtime. I mention this scenario because at runtime the source is not available so filtering in privates-if-soruce-available is not a viable solution either. 

    Just because a member is marked private doesn't mean that nobody sees it; reflection can dig it out just fine; type system authors do it all the time.

    Thanks,
    Chris 


    • Edited by kingces95 Tuesday, January 17, 2012 2:12 AM
    Monday, January 16, 2012 11:44 PM