How to create an extremely large array/object (> 2 GB) without using jagged arrays ?


  • Hello,

    I had implemented a matrix scaling algorithm in Java (64-bit) for a university project and have now implemented the code in C# with .NET 3.5 64-bit. One of the operations I did in Java was creating a rectangular array with double[][] = ... (in C# this is a jagged array!) to store a symmetric matrix which is used as input for the algorithm. Since I was using matrices with n up to 30720, memory usage for the single matrix object was around 8 GB in it's maximum and with matrix multiplication around 20 GB.

    So... now I implemented this in C# and I am shocked to see that double[,] (rectangular arrays) can only be 2 GB in size even on 64-bit! That means that for matrices with n larger than ~15000 I will get OutOfMemory Exceptions although there is a huge amount of free memory (I have two quad core systems with 12 GB and 24 GB Ram). When I use jagged arrays in C# (i.e. double[][]) I can create larger matrices. That is because I have one small array with the size of the row number (i.e. n) and then for each row an array of size n, but they are all single objects so I don't reach the 2 GB per object limit. The problem with jagged arrays is that they are horribly slow for matrice calculations compared to rectangular arrays and even slower compared to single dimension arrays.

    Actually I intended to rewrite my code and simulate the matrices in single arrays (see sample code below) because that speeds up the algorithm quite considerably compared to using rectangular arrays. Anyway... I need to find a way to allocate more than 2 GB though, or I am forced to continue working on Java as a 15000*15000 matrix is actually still pretty small for this algorithm.

    Sample Code:

    Random random = new Random();
    double[] result = new double[n * n];
    for (int i = 0; i < n; i++)
       result[i * n + i] = Math.Abs(random.NextDouble());
       for (int j = 0; j < i; j++)
          result[i * n + j] = result[j * n + i] = Math.Abs(random.NextDouble());

    This will give me an OOM with n > ~15000 which is a bad joke really :-( what is 64-bit for then :P

    P.S. The reason I implemented this in C# was actually because I intended to play around with the .NET Parallel Extensions and see if I can get an efficient performance boost, although I doubt it at the moment due to the object memory limit.
    • Editado ChristophSackl jueves, 23 de julio de 2009 22:10 Spelling, Formatting
    jueves, 23 de julio de 2009 22:06

Todas las respuestas

  • When I use jagged arrays in C# (i.e. double[][]) I can create larger matrices. That is because I have one small array with the size of the row number (i.e. n) and then for each row an array of size n, but they are all single objects so I don't reach the 2 GB per object limit. The problem with jagged arrays is that they are horribly slow for matrice calculations compared to rectangular arrays and even slower compared to single dimension arrays.

    Have you actually profiled and tested this?  There are many optimizations in C# that help the speed of jagged arrays - it's been my experience that they always outperform my expectations.  In my profiling (I've done this a few times now), I've found that they often outperform rectangular arrays, especially for things like matrix computations.

    That being said, the best way to work around this that I've found is to revert to native code and use C++/CLI.  You can allocate an unmanaged array >2GB, and make an ilist-esque interface to it that uses Int64 (or UInt32) instead of Int32 for the indexing.  This can be a very small assembly, used for nothing but this one array type.

    Reed Copsey, Jr. - http://reedcopsey.com
    jueves, 23 de julio de 2009 23:44
  • I have tested this with the code above (tried single dimensionsal, rectangular and jagged arrays) and single dimensional arrays were about 4-times as fast as jagged arrays and rectangular were about 2-times as fast as jagged arrays. Since matrix generation is not that important for the algorithm I also tested matrix multiplication which is a major step in my program and there jagged arrays performed about 8-times slower than rectangular arrays for any n above 1024. With small n the difference was not that big. This article here draws similar conclusions about jagged arrays: http://msdn.microsoft.com/en-us/magazine/cc163995.aspx. I don't know if C++ would help since the .NET Parallel extensions are only working with C# and that they were the reason I ported the algorithm from Java to C# after all. Well, I'll see if I can find a solution...

    Here are some of the times I measured...

    symmetric Matrix size n = 2048

    store 2 matrices with double[]
    108,564 seconds matrix multiplication
    29,414 seconds matrix multiplication Parallel.For

    store the same 2 matrices with double[][]
    162,006 seconds matrix multiplication +49% time
    40,622 seconds matrix multiplication Parallel.For +38% time

    All 4 measurements are an average of 5 measurements and were executed seperate from each other. Storing the 2 matrices in double[,] is in between singular and jagged arrays performance wise. The thing was executed on a Core i7 920 by the way, so a 3,69* speedup for double[] is pretty nice :) The jagged array is even sped up by 3,98*. So mabye when I get a few dozen more cores the jagged array will have the same speed as the singular array ;-)

    • Editado ChristophSackl viernes, 24 de julio de 2009 10:26 Add "Benchmark" Times
    viernes, 24 de julio de 2009 9:04
  • There are two basic array operations that would cause slow down.  The first is indexing the array in the wrong order, one that doesn't give locality of reference.  Indexing it in row order instead of column order for example.  That causes large slowdowns because the next indexed element is not in the CPU cache.  The cache line constantly needs to be loaded from RAM.

    The next is the index out-of-bound test.  The JIT compiler can optimize this test away if it knows that the indexing operation can never be OOB.  A foreach loop is always good.  Verify this by looking at the generated machine code.

    Hans Passant.
    viernes, 24 de julio de 2009 11:11
  • >>> Indexing it in row order instead of column order for example.

    I have tried this by switching the indexes for the jagged arrays and it has no implication on performance in profiling. Also foreach loops won't work for matrix multiplication in my opinion, at least I couldn't generate working code with foreach loops.

    Right now I am coding in Java again. The speedups with the .NET parallel extensions are awesome(!!!) but I really need to use objects larger than 2 GB and I was not able to implement an efficient custom data type for single dimension arrays that allows them to grow larger than 2 GB. If anyhone has a solution I'd be grateful. I also don't need any functions like Array.Copy, it is completely sufficient when I can access the array by index. The rest is self-coded.

    viernes, 24 de julio de 2009 17:19
  • Christpoh:

    Depending on how much performance you ~need~ to sqeeze out of this, there is another option -

    I've actually found that, for large arrays like this, if you're willing to revert to unsafe code and use pointers for your maniuplation, you can completely eliminate the bounds checking C# adds.  Between that, and the indexing suggestions in Hans' post, you can actually get the C# to out perform native code in many cases....  Jagged arrays work very well with pointers, if you're careful to index everything correctly.

    Between that and the parallelization you're adding, you'll probably be surprised at how much performance you can squeeze out of this.

    To get some idea of what's possible, I'd read up on this SO question .  I went back and forth with the poster, and after a few optimizations, his C# version is outperforming his original native C code for a FF transform...
    Reed Copsey, Jr. - http://reedcopsey.com
    viernes, 24 de julio de 2009 17:20
  • Hmmm, just putting unchecked in front of my class didn't do anything(?), the speed is still the same. So I guess I have to use pointers to see a difference, but not using pointers is actually the reason I stick to Java and C# ;) because I really have no idea how to use them properly. I will see if I can find some sample code.
    viernes, 24 de julio de 2009 22:30
  • I have the similar problem - I need square matrices with more than 20000 items in a row. I read about 2GB .Net Garbage Collector limit. So as I understood the single answer - is using unsafe code or separate library built withing C++ compiler.

    The problem for me is worth because ushort[20000,20000] is smaller then 2GB but actually I cannot set even 700MB of memory. My limit is 650MB and I don't understand why - I have 32bit WinXP with 3GB of memory. I tried to use Marshal.AllocHGlobal(700<<20) throws OutOfMemoryException, GC.GetTotalMemory returns 4.5MB before trying to allocate memory.

    By I'm already bored with these games with .NET.

    I found only that many people say use unsafe code but I cannot find example of how to declare matrix and how to work with it using pointers. Is it pure C++ code inside of unsafe{} brackets?

    Could you please provide a small example of working with matrices using pointers in unsafe code.

    martes, 06 de abril de 2010 17:22
  • Check out this page http://innovatian.com/2010/03/parallel-matrix-multiplication-with-the-task-parallel-library-tpl/

    The code portition you would specifically want to look at is this one http://github.com/innovatian/innovatian.blog/blob/master/source/MatrixMultipplication/UnsafeSingle.cs

    I cannot confirm that jagged arrays are faster than multi-dimensional arrays, no matter wheter you're working multi-threaded or not. Unsafe code is indeed faster than manage code though. I was testing all the code on a Core i7 with HT disabled and of all the C# only code versions the unsafe implementation was the fastest for me (.NET 3.5).

    martes, 27 de julio de 2010 21:25
  • .NET Framework 4.5 allows creating arrays larger than 2GB on 64 bit platforms. This feature is not on by default, it has to be enabled via config file using the <gcAllowVeryLargeObjects> element.


    viernes, 09 de marzo de 2012 20:21