none
Are multi-dimensional arrays of value types contiguous memory blocks? RRS feed

  • Question

  • I am debugging some code that manipulates arrays of value types. Some of these are 1D, some 2D and so on.

    In all cases, arrays declared as: SomeStruct[,,] and so on, seem to be laid out as a contiguous block of memory (much as I would expect in an unmanaged 3D array).

    This is fine, except that many articles I read stress that only single-dimensional arrays are laid out this way, with multi-dimensional arrays requiring additional overheads or metadata.

    This is easy to prove by just creating some array (say of the 'Point' struct) and then just displaying any element in a debug Memory window, you will see the address of the element clearly in the dump window. Whatever element I use for any kind of array (1D, 2D etc) the address is always exactly what I would expect for an array that is a contiguous block.

    So why am I confused??

    Cap'n

     

    Saturday, July 3, 2010 4:36 PM

Answers

  • It's all a result of the need for security and safety and reliability, but I do wonder sometimes if more could be done to improve the kinds of stuff possible via marshaling and interop.

    This question, and the one about GC moving clearly show that you are working to subvert the managed mechanism.

    Maybe you should ask yourself this: Why am I building a managed application when its performance capabilities are inadequate? I am making the assumption that you have already built a fully functioning application, have tested it, and found it to be lacking. I know you're not performing premature optimization.

    Premature optimization is the root of all evil.

    • Marked as answer by SamAgain Monday, July 12, 2010 10:06 AM
    Saturday, July 3, 2010 9:24 PM
  • I am debugging some code that manipulates arrays of value types. Some of these are 1D, some 2D and so on.

    In all cases, arrays declared as: SomeStruct[,,] and so on, seem to be laid out as a contiguous block of memory (much as I would expect in an unmanaged 3D array).

    This is fine, except that many articles I read stress that only single-dimensional arrays are laid out this way, with multi-dimensional arrays requiring additional overheads or metadata.

    This is easy to prove by just creating some array (say of the 'Point' struct) and then just displaying any element in a debug Memory window, you will see the address of the element clearly in the dump window. Whatever element I use for any kind of array (1D, 2D etc) the address is always exactly what I would expect for an array that is a contiguous block.

    So why am I confused??

    Cap'n

     


    Arrays of any dimension and of any element type are always laid out as a single contiguous memory block by the CLR.  Of course, in the case of reference types the elements themselves would point elsewhere.

    Your confusion stems from the fact that many people coming from C/C++ don't distinguish between multidimensional arrays and jagged arrays (nested arrays, arrays of arrays).  That's because C/C++ does not have true built-in multidimensional arrays, only jagged arrays.

    Multidimensional array: T[,] -- always a single contiguous block of memory.

    Jagged array: T[][] -- a one-dimensional array (which is a single contiguous block of memory) whose elements are themselves one-dimensional arrays (which are also single contiguous blocks of memory... but different ones).

    There is no additional overhead for multidimensional arrays, although element access is a lot slower than for one-dimensional arrays which get special-case compiler optimization.

    Jagged arrays have the obvious overhead of allocating a new (one-dimensional) array for each element.  That's probably what the article was referring to.

    • Marked as answer by SamAgain Monday, July 12, 2010 10:06 AM
    Sunday, July 4, 2010 8:47 AM
  • but if the GC has to contend with actually moving multi-megabyte blocks of memory, I have to say I am not impressed and could do better.

    Multi-megabyte objects would be Gen 2 objects. I doubt that Gen 2 memory is defragmented very often, if at all.

    • Marked as answer by SamAgain Monday, July 12, 2010 10:06 AM
    Monday, July 5, 2010 2:38 AM
  • Again I would refer you to my earlier comment. If your interest is simply understanding, that's fine. Get a good book:

    CLR via C#

    Debugging Microsoft .NET 2.0 Applications

    • Marked as answer by SamAgain Monday, July 12, 2010 10:06 AM
    Monday, July 5, 2010 2:47 AM
  • So, it finally emerges, an answer to my core question: I do NOT need to worry about pinning such an array when copying to it/from it in unmanaged code, because it can never be moved anyway.

    Is there a way to "force" objects to be created in the LOH? or is the decision made purely by the system based on some heuristic?


    You still need to pin a large array because that's the only way to obtain an IntPtr that you can pass to unmanaged code.  However, the pinning would be effectively a null operation since nothing in that heap ever gets moved anyway.

    I'm not aware of any way to force allocation on the LOH, other than pad the size of the array so that it passes the CLR's heuristic threshold.  That threshold might change in future CLR revisions, too.

    • Marked as answer by SamAgain Monday, July 12, 2010 10:06 AM
    Monday, July 5, 2010 1:03 PM
  • It would be neat of we could pin any object blittable or not (like a non blittable value type).

    class MyType
    {
        Object o1;
        Object o2;
    }

    Each of those references points to a separate block of memory. You can't blit something that is physically located in different parts of memory. A blit is a single memcpy operation.

    * edit: Oops, your comment was on pinnability, not blittability, I'm terribly distracted today...

    Why can't you pin non-blittable objects? I don't know the answer to that, I can only guess that pins are expensive, even on blittable objects, implementing a pin on an object spread across memory increases the expense (complexity) exponentially.

    There's another factor involved in blittablilty (and by association pinnability):

    What if you have a reference type that consists of nothing but blittable types? Should that automatically be blittable? No, because automatically it's LayoutKind is Auto. Auto allows the runtime to mix, match, and pad member fields. That means that a blit to unmanaged code will result in that code accessing the wrong values for fields.

    • Marked as answer by SamAgain Monday, July 12, 2010 10:06 AM
    Saturday, July 3, 2010 7:33 PM
  • I am a bit puzzled as to why the syetem would allow itself to "relocate" potentially massive data blocks around in the GC?


    It doesn't!  Very large objects are allocated on the "large object heap" which is a special heap section that never gets compacted.  Therefore, these objects never get moved around by the garbage collector.  On the downside, it's subject to fragmentation because it doesn't get compacted -- like a traditional C/C++ heap.

    I second the book recommendation for CLR via C# by Jeff Richter (Microsoft Press 2010, 3rd ed.).  Everything you want to know about CLR memory allocation is explained there.

    • Marked as answer by SamAgain Monday, July 12, 2010 10:06 AM
    Monday, July 5, 2010 6:55 AM

All replies

  • I believe the answer is that it is implementation dependent. That means that the CLR is free to do whatever it likes. Since it makes sense for any CLR to allocate a single block of memory for the entire array, that's what is most likely to happen. The overhead likely is mapped out in another part of the block, or a separate block entirely.

    The point is that you cannot rely on this behavior, nor should you care. When you moved to "managed code", you relinquished the right to care about memory management details--just not reference management.

    Saturday, July 3, 2010 5:05 PM
  • I believe the answer is that it is implementation dependent. That means that the CLR is free to do whatever it likes. Since it makes sense for any CLR to allocate a single block of memory for the entire array, that's what is most likely to happen. The overhead likely is mapped out in another part of the block, or a separate block entirely.

    The point is that you cannot rely on this behavior, nor should you care. When you moved to "managed code", you relinquished the right to care about memory management details--just not reference management.


    But why do we read all over the place that contiguous blocks are the case for single-dimensional arrays, therefore implying that it is never the case for multidimensional arrays? or am I misunderstanding what I read some place?

    The only reason we care is that we can use a fast copy-memory operation to copy from unmanaged memory directly to managed memory, all elements in one fast hit, we dont use the marshaler for this because it's limited to blittable array elements (our unmanged code can cope with any value type elements (that don't contain references that is)).

    We do this to perform a "ToArray" operation that creates a true managed array very rapildy from a large block of unmanaged memory (that contains elements that are technically non-blittable from .Net's perspective).

    Cap'n

     

     

    Saturday, July 3, 2010 5:42 PM
  • But why do we read all over the place that contiguous blocks are the case for single-dimensional arrays, therefore implying that it is never the case for multidimensional arrays? or am I misunderstanding what I read some place?

    The implication is that you cannot rely on the behavior, not that the behavior is one thing or the other.

    Is it safe to do what you are doing? No.

    Is it possible that what you are doing will remain error-free in the future? I think it likely, but you get no guarantee here. If you violate the principle, you're on your own. You have voided the warranty as they say.

    Saturday, July 3, 2010 6:01 PM
  • BTW, your other question (GC moving memory) had me thinking. Is a multi-dim array blittable?

    It turns out that it is. Since it's blittable, you can rest-assured that it will not be fragmented.

    I tested it out by allocating a GCHandle on several multi-dim arrays, and all were successfully pinned (meaning they are blittable). I wonder though if it's possible that a very large one would in fact be segmented (and therefore not blittable):

    new byte[Int32.MaxValue, 4]

    That array may very well be segmented (because each dimension is limited to int.MaxValue). You need a 64-bit OS with significant memory to test it (or a very large, contiguous, swap-file), which I don't have handy right now.

    Saturday, July 3, 2010 6:28 PM
  • BTW, your other question (GC moving memory) had me thinking. Is a multi-dim array blittable?

    It turns out that it is. Since it's blittable, you can rest-assured that it will not be fragmented.

    I tested it out by allocating a GCHandle on several multi-dim arrays, and all were successfully pinned (meaning they are blittable). I wonder though if it's possible that a very large one would in fact be segmented (and therefore not blittable):

    new byte[Int32.MaxValue, 4]

    That array may very well be segmented (because each dimension is limited to int.MaxValue). You need a 64-bit OS with significant memory to test it (or a very large, contiguous, swap-file), which I don't have handy right now.


    Yes, your are correct here I think.

    Also the AddrOfPinnedObject is the address of the very first element.

    I'd assumed that multi-dimensional arrays of blittable types were not pinnable, thanks for showing me to be wrong!

    It would be neat of we could pin any object blittable or not (like a non blittable value type).

    Cap'n

     

    Saturday, July 3, 2010 7:07 PM
  • It would be neat of we could pin any object blittable or not (like a non blittable value type).

    class MyType
    {
        Object o1;
        Object o2;
    }

    Each of those references points to a separate block of memory. You can't blit something that is physically located in different parts of memory. A blit is a single memcpy operation.

    * edit: Oops, your comment was on pinnability, not blittability, I'm terribly distracted today...

    Why can't you pin non-blittable objects? I don't know the answer to that, I can only guess that pins are expensive, even on blittable objects, implementing a pin on an object spread across memory increases the expense (complexity) exponentially.

    There's another factor involved in blittablilty (and by association pinnability):

    What if you have a reference type that consists of nothing but blittable types? Should that automatically be blittable? No, because automatically it's LayoutKind is Auto. Auto allows the runtime to mix, match, and pad member fields. That means that a blit to unmanaged code will result in that code accessing the wrong values for fields.

    • Marked as answer by SamAgain Monday, July 12, 2010 10:06 AM
    Saturday, July 3, 2010 7:33 PM
  • It would be neat of we could pin any object blittable or not (like a non blittable value type).

    class MyType
    {
        Object o1;
        Object o2;
    }

    Each of those references points to a separate block of memory. You can't blit something that is physically located in different parts of memory. A blit is a single memcpy operation.

    Yes but an array of DateTime for example? These are not-blittable, but I might have unmanaged code that is custom writtent to handle this. Because the Marshaller insists on only letting you pin stuff that is blittable, we could never work with it. Or what about structs that contain reference fields? these too aren't blittable so cant be pinned and so cant be accessed from unmanaged code.

    One could easily write code that did nothing to these ref fields (which are just 4 byte blocks on x86 and 8 byte blocks on x64) but did do stuff to other fields like ints, bytes etc.

    It's all a result of the need for security and safety and reliability, but I do wonder sometimes if more could be done to improve the kinds of stuff possible via marshaling and interop.

    Cap'n

     

    Saturday, July 3, 2010 8:01 PM
  • It's all a result of the need for security and safety and reliability, but I do wonder sometimes if more could be done to improve the kinds of stuff possible via marshaling and interop.

    This question, and the one about GC moving clearly show that you are working to subvert the managed mechanism.

    Maybe you should ask yourself this: Why am I building a managed application when its performance capabilities are inadequate? I am making the assumption that you have already built a fully functioning application, have tested it, and found it to be lacking. I know you're not performing premature optimization.

    Premature optimization is the root of all evil.

    • Marked as answer by SamAgain Monday, July 12, 2010 10:06 AM
    Saturday, July 3, 2010 9:24 PM
  • I am debugging some code that manipulates arrays of value types. Some of these are 1D, some 2D and so on.

    In all cases, arrays declared as: SomeStruct[,,] and so on, seem to be laid out as a contiguous block of memory (much as I would expect in an unmanaged 3D array).

    This is fine, except that many articles I read stress that only single-dimensional arrays are laid out this way, with multi-dimensional arrays requiring additional overheads or metadata.

    This is easy to prove by just creating some array (say of the 'Point' struct) and then just displaying any element in a debug Memory window, you will see the address of the element clearly in the dump window. Whatever element I use for any kind of array (1D, 2D etc) the address is always exactly what I would expect for an array that is a contiguous block.

    So why am I confused??

    Cap'n

     


    Arrays of any dimension and of any element type are always laid out as a single contiguous memory block by the CLR.  Of course, in the case of reference types the elements themselves would point elsewhere.

    Your confusion stems from the fact that many people coming from C/C++ don't distinguish between multidimensional arrays and jagged arrays (nested arrays, arrays of arrays).  That's because C/C++ does not have true built-in multidimensional arrays, only jagged arrays.

    Multidimensional array: T[,] -- always a single contiguous block of memory.

    Jagged array: T[][] -- a one-dimensional array (which is a single contiguous block of memory) whose elements are themselves one-dimensional arrays (which are also single contiguous blocks of memory... but different ones).

    There is no additional overhead for multidimensional arrays, although element access is a lot slower than for one-dimensional arrays which get special-case compiler optimization.

    Jagged arrays have the obvious overhead of allocating a new (one-dimensional) array for each element.  That's probably what the article was referring to.

    • Marked as answer by SamAgain Monday, July 12, 2010 10:06 AM
    Sunday, July 4, 2010 8:47 AM
  • I am debugging some code that manipulates arrays of value types. Some of these are 1D, some 2D and so on.

    In all cases, arrays declared as: SomeStruct[,,] and so on, seem to be laid out as a contiguous block of memory (much as I would expect in an unmanaged 3D array).

    This is fine, except that many articles I read stress that only single-dimensional arrays are laid out this way, with multi-dimensional arrays requiring additional overheads or metadata.

    This is easy to prove by just creating some array (say of the 'Point' struct) and then just displaying any element in a debug Memory window, you will see the address of the element clearly in the dump window. Whatever element I use for any kind of array (1D, 2D etc) the address is always exactly what I would expect for an array that is a contiguous block.

    So why am I confused??

    Cap'n

     


    Arrays of any dimension and of any element type are always laid out as a single contiguous memory block by the CLR.  Of course, in the case of reference types the elements themselves would point elsewhere.

    Your confusion stems from the fact that many people coming from C/C++ don't distinguish between multidimensional arrays and jagged arrays (nested arrays, arrays of arrays).  That's because C/C++ does not have true built-in multidimensional arrays, only jagged arrays.

    Multidimensional array: T[,] -- always a single contiguous block of memory.

    Jagged array: T[][] -- a one-dimensional array (which is a single contiguous block of memory) whose elements are themselves one-dimensional arrays (which are also single contiguous blocks of memory... but different ones).

    There is no additional overhead for multidimensional arrays, although element access is a lot slower than for one-dimensional arrays which get special-case compiler optimization.

    Jagged arrays have the obvious overhead of allocating a new (one-dimensional) array for each element.  That's probably what the article was referring to.


    Thanks for this info, its much appreciated.

    I am a bit puzzled as to why the syetem would allow itself to "relocate" potentially massive data blocks around in the GC?

    If an app was working with hundreds of 3D or 4D arrays, then the GC would possibly end up hogging the CPU all day.

    The only issue in conventional code, is the size of the array and perhaps its impact on stack or paging, but if the GC has to contend with actually moving multi-megabyte blocks of memory, I have to say I am not impressed and could do better.

    Cap'n

     

     

    Monday, July 5, 2010 2:25 AM
  • but if the GC has to contend with actually moving multi-megabyte blocks of memory, I have to say I am not impressed and could do better.

    Multi-megabyte objects would be Gen 2 objects. I doubt that Gen 2 memory is defragmented very often, if at all.

    • Marked as answer by SamAgain Monday, July 12, 2010 10:06 AM
    Monday, July 5, 2010 2:38 AM
  • Again I would refer you to my earlier comment. If your interest is simply understanding, that's fine. Get a good book:

    CLR via C#

    Debugging Microsoft .NET 2.0 Applications

    • Marked as answer by SamAgain Monday, July 12, 2010 10:06 AM
    Monday, July 5, 2010 2:47 AM
  • I am a bit puzzled as to why the syetem would allow itself to "relocate" potentially massive data blocks around in the GC?


    It doesn't!  Very large objects are allocated on the "large object heap" which is a special heap section that never gets compacted.  Therefore, these objects never get moved around by the garbage collector.  On the downside, it's subject to fragmentation because it doesn't get compacted -- like a traditional C/C++ heap.

    I second the book recommendation for CLR via C# by Jeff Richter (Microsoft Press 2010, 3rd ed.).  Everything you want to know about CLR memory allocation is explained there.

    • Marked as answer by SamAgain Monday, July 12, 2010 10:06 AM
    Monday, July 5, 2010 6:55 AM
  • I am a bit puzzled as to why the syetem would allow itself to "relocate" potentially massive data blocks around in the GC?


    It doesn't!  Very large objects are allocated on the "large object heap" which is a special heap section that never gets compacted.  Therefore, these objects never get moved around by the garbage collector.  On the downside, it's subject to fragmentation because it doesn't get compacted -- like a traditional C/C++ heap.

    I second the book recommendation for CLR via C# by Jeff Richter (Microsoft Press 2010, 3rd ed.).  Everything you want to know about CLR memory allocation is explained there.


    Thanks

    Yes I'm aware of the book, I perused it recently in fact (its shocking how much a book costs in a store compared to online!) and will order it.

    So, it finally emerges, an answer to my core question: I do NOT need to worry about pinning such an array when copying to it/from it in unmanaged code, because it can never be moved anyway.

    Is there a way to "force" objects to be created in the LOH? or is the decision made purely by the system based on some heuristic?

    Thanks

    Cap'n

    PS: Correction, the array won't be placed in the LOH if the total object size is < 85,000 bytes.

    Monday, July 5, 2010 11:11 AM
  • So, it finally emerges, an answer to my core question: I do NOT need to worry about pinning such an array when copying to it/from it in unmanaged code, because it can never be moved anyway.

    Is there a way to "force" objects to be created in the LOH? or is the decision made purely by the system based on some heuristic?


    You still need to pin a large array because that's the only way to obtain an IntPtr that you can pass to unmanaged code.  However, the pinning would be effectively a null operation since nothing in that heap ever gets moved anyway.

    I'm not aware of any way to force allocation on the LOH, other than pad the size of the array so that it passes the CLR's heuristic threshold.  That threshold might change in future CLR revisions, too.

    • Marked as answer by SamAgain Monday, July 12, 2010 10:06 AM
    Monday, July 5, 2010 1:03 PM