none
Object instance layout: string literals? RRS feed

  • Question

  • Please, can anyone provide more detail on the figure from the "How the CLR Creates Runtime Objects" article?

    I'm confused with "string literals" section, I'm not able to find any reason to store those as the part of every object instance and I even could not deduct what is really stored there:

    #US (user string) stream record id? No, those emitted directly into IL as a ldstr param and are hardcoded into the JITted code at runtime. There's no need to store it as a part of each instance.

    Runtime-interned instances? Again, no. First, they are usual object instances and the intern table is just a GC root used to keep the interned strings from being GC'ed. Second, there may be arbitrary number of runtime-interned strings, so we'll have no way to infer the Base Instance Size

    Just a meaningless picture and the string literals (whatever the authors implied) are not stored as a part of the object instance? Don't know.

    P.S. Also posted on stackoverflow.

    Thanks!

    Thursday, September 1, 2011 1:59 AM

All replies

  •  

    The String type is derived immediately from Object, making it a reference type, and therefore, String objects (its array of characters) always live in the heap, never on a thread’s stack.

     

    In C#, you can't use the new operator to construct a String object from a literal string:

    using System;

    public static class Program {

    public static void Main() {

    String s = new String("Hi there."); // <-- Error

    Console.WriteLine(s);

    }

    }

    Instead, you must use the following simplified syntax:

    using System;

    public static class Program {

    public static void Main() {

    String s = "Hi there.";

    Console.WriteLine(s);

    }

    }

    If you compile this code and examine its IL (using ILDasm.exe), you’d see the following:

    .method public hidebysig static void Main() cil managed

    {

    .entrypoint

    // Code size 13 (0xd)

    .maxstack 1

    .locals init (string V_0)

    IL_0000: ldstr "Hi there."

    IL_0005: stloc.0

    IL_0006: ldloc.0

    IL_0007: call void [mscorlib]System.Console::WriteLine(string)

    IL_000c: ret

    } // end of method Program::Main

    The newobj IL instruction constructs a new instance of an object. However, no newobj instruction appears in the IL code example. Instead, you see the special ldstr (load string) IL instruction, which constructs a String object by using a literal string obtained from metadata.

    This shows you that the common language runtime (CLR) does, in fact, have a special way of constructing literal String objects.

    If you are using unsafe code, you can construct a String object from a Char* or SByte*. To accomplish this, you would use C#’s new operator and call one of the constructors provided by the String type that takes Char* or SByte* parameters. These constructors create a String object, initializing the string from an array of Char instances or signed bytes. The other constructors don't have any pointer parameters and can be called using safe (verifiable) code written in any managed programming language.

     

    I suggest you can read the book-- CLR via C# , chapter 14


    Paul Zhou [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Monday, September 5, 2011 8:55 AM
  • Ohh... please, read the posts before answering on it. I wrote that the "string literals" instantiated by ldstr (more precisely, they're read from the #US stream) could not be the part of object instance layout. Quoting,

    "#US (user string) stream record id ... emitted directly into IL as a ldstr param and are hardcoded into the JITted code at runtime. There's no need to store it as a part of each instance."

     

    So, the question "What stores 'string literals' secton of the object instance layout?" stays open.

    Monday, September 5, 2011 10:27 AM
  • Hi,

    You can write a sample, and use WinDbg or SOS.dll to check the string literals. I think string objects are always in GC heaps, even it is interned.

    There is a detail discussion in the link, you can read it.


    Paul Zhou [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Tuesday, September 6, 2011 8:08 AM
  • Hi! I did it and I've attached with VS  to the running process and re-checked it using "memory window". At least at first look there is no sings that the "string literals section" refers to the part of memory allocated per each object instance.

    P.S. Link you posted is the same "just a GC root" link from the starting post;)

    Tuesday, September 6, 2011 8:45 AM
  • There is a simple way to check that.

    I write a code snippet:

      string x = "value for test1";
                Console.WriteLine(x);
                Console.WriteLine("value for test2");

    Debug it in VS2010, open Immediate windows, type command as below.

    !load sos.dll

    !dumpheap -strings(it restrict the outputs to a statistical string value summary)

    Then you can see the strings both "value for test1" and "value for test2" are listing there. It proves that string literals are in GC heaps.


    Paul Zhou [MSFT]
    MSDN Community Support | Feedback to us
    Get or Request Code Sample from Microsoft
    Please remember to mark the replies as answers if they help and unmark them if they provide no help.

    Thursday, September 8, 2011 7:48 AM
  • Hello, Paul!

    These strings are usual CLR objects (more details here). Field references to them are stored the same way as any other field references - in the "Instance Fields" section. However, the figure contains another one section ("string literals") referring to string literal tables. I suppose the fugure does not have any link with real implementation. It is one and only one source stating that the memory layout  of each CLR object has special "string literals" section.

    Thursday, September 8, 2011 8:52 AM