none
How to handle heavy data in .NET RRS feed

  • Question

  • Hi

    I am dealing with huge data. The data is stored in files and application loads it into memory in float arrays.
    The size is few hundreds of MB. There can be multiple such files.
    Currently we are keeping entire data in memory for performance. But because of that at a time we are not able to open more than one file as it gives system out of memory exception.
    Can someone suggest me techniques by which we can achieve performance as well as better memory utilization. we are working in 3.5, C#

    The application is a kind of image processing application where huge data is displayed on chart(third party) and can be manually interacted with.

    Thanks and Regards
    - JC
    Saturday, June 6, 2009 12:09 PM

Answers

  • Which is it? 512*sizeof(float[32768]= 66MB, or 512 * that = 32GB?

    66MB is tiny, and you shouldn't have any trouble at all.

    32GB is huge, and you're going to be pushing the boundaries of avaible physical memory, and virtual address space at every turn.

    You might think about implementing your own LRU cache to manage a GB or so of float[32768]s. The advantage of doing it yourself rather than relying on the garbage collector, is that you get to choose the policy that controls which segments get discarded. Any garbage collection based scheme is going to lose ALL the weak-referenced large object data on a suitably high-level garbage collect; and if you don't have weak references, then garbage collection won't collect any garbage. 

    Reading 32GB of data off disk isn't going to be that fast. So you should be going in with fairly modest expectations, UNLESS there are brilliant strategies for reducing the total working set at any given time.

    If you have to touch every byte of that 32GB of memory every time you process data, then you may as well do it serially, one file at a time. An LRU cache isn't going to help. At that point, you're not bound by memory, your'e bound by how fast you can read 32GB of data off disk.
    • Marked as answer by Zhi-Xin Ye Friday, June 12, 2009 12:26 PM
    Monday, June 8, 2009 5:37 AM

All replies

  • This is a typical problem for programs that allocate large arrays on a 32-bit operating system.  Address space fragmentation gets them into trouble once the array goes over 256 Megabytes.  You'll get OOM when there's no hole left that fits the array, well before the program has consumed all available virtual memory.

    Redesigning the app to sub-divide the arrays is rarely worth the trouble.  Two hundred dollars buys you a 64-bit operating system and a couple of sticks of RAM, problem solved.
    Hans Passant.
    Saturday, June 6, 2009 12:51 PM
    Moderator
  • Besides that it never really hurts to run your application through a memory profiler to see if you are using large chunks of memory that can be used smarter, or large volumes of object creations that you could eliminate.
    Saturday, June 6, 2009 6:00 PM
  • Is there any other way apart from this solution of buying 64-bit OS. How are the options of memory mapping or enterprise library cache(not aware of this).

    or what if we devide arrays into smaller arrays.

    Sunday, June 7, 2009 1:04 AM
  • I wish there was a solution (other then 64 bit, but even that has no guarantee it will solve your problems) where i just tell you 'do X and all your problems will be gone' hower solutions that work for one application might drasticly decrease performance of another application, without intimite knowledge of your code and algorithms its impossible to tell . Perhaps you are keeping a reference to a massive object that you no longer need or copy a lot of data around that doesn't really need to be copied. The standard rules of increasing performance are

    1. Run code through a profiler
    2. Locate hotspots that cause trouble
    3. Fix problem
    4. goto 1 until performance is acceptable.

    redgate ants is pretty decent and even comes with a 14 day trial but there's others available too.


    Sunday, June 7, 2009 1:47 AM
  • As I said we load more than one file of size in few MBs.
    Obviously there will be some restriction beyond which data cannot be loaded into the application. How can I get rid of that problem.
    I agree that there could be problems as mentioned in the above reply like "Perhaps you are keeping a reference to a massive object that you no longer need or copy a lot of data around that doesn't really need to be copied"

    But a general question is how to manage arrays with big size eg. float[] abc = new float[32894] and 500 such arrays. This is the typical scenario in the application.

    Sunday, June 7, 2009 10:26 AM
  • That's not big, 66 MB should not be an issue.  Maybe your program doesn't allocate enough generation #0 memory to trigger a collection.  Those arrays go into the Large Object Heap.  Call GC.Collect() after you've processed the arrays and have no reference to them left.

    Hans Passant.
    Sunday, June 7, 2009 12:06 PM
    Moderator
  • May be 66MB is not big. But there are 512 such arrays.
    i.e. total memory is 512 * 66MB.
    Monday, June 8, 2009 4:44 AM
  • Which is it? 512*sizeof(float[32768]= 66MB, or 512 * that = 32GB?

    66MB is tiny, and you shouldn't have any trouble at all.

    32GB is huge, and you're going to be pushing the boundaries of avaible physical memory, and virtual address space at every turn.

    You might think about implementing your own LRU cache to manage a GB or so of float[32768]s. The advantage of doing it yourself rather than relying on the garbage collector, is that you get to choose the policy that controls which segments get discarded. Any garbage collection based scheme is going to lose ALL the weak-referenced large object data on a suitably high-level garbage collect; and if you don't have weak references, then garbage collection won't collect any garbage. 

    Reading 32GB of data off disk isn't going to be that fast. So you should be going in with fairly modest expectations, UNLESS there are brilliant strategies for reducing the total working set at any given time.

    If you have to touch every byte of that 32GB of memory every time you process data, then you may as well do it serially, one file at a time. An LRU cache isn't going to help. At that point, you're not bound by memory, your'e bound by how fast you can read 32GB of data off disk.
    • Marked as answer by Zhi-Xin Ye Friday, June 12, 2009 12:26 PM
    Monday, June 8, 2009 5:37 AM