none
Memory allocation when reading large files RRS feed

  • Question

  • If I need to read the contents of a 2GB file into memory and I only have 1GB of RAM, what happens?

    Sunday, August 22, 2010 4:29 PM

Answers

  • Well, on 32-bit windows you get 2GB allocated for a user process by default.  Most of that is of course virtual ram - basically, swap space on disk.  Reading a 2GB file in to memory, is probably NOT a good idea as you will basically be filling up your processes memory space, and will likely get out of memory conditions.

    Why do you need to read it all into memory?  Can you explain what it  is your doing with this file, I'm sure a more appropriate solution can be had.


    Tom Shelton
    • Marked as answer by SamAgain Thursday, August 26, 2010 6:51 AM
    Sunday, August 22, 2010 4:52 PM
  • If you read an entire file into memory Windows will try to read it into physical memory first.  If Windows decides that it doesn't have enough free physical memory for the file it will copy the data to a page file and read it from the disk which will be orders of magnitude slower.  Most (if not all) applications which must read from large files do so by reading as little as possible from the file, getting only what they need when they need it. The key to that is the development/use of file formats which make this process easier (ie using lookup tables/offsets etc).

    Without knowing the details of exactly what you're doing, I will say that your example would be better written as:

    byte[] buffer = new byte[1024];
    while(!reader.EOF)
    {
      int bytesRead = reader.ReadBytes(buffer, 0, 1024);
      // Determine if the buffer contains what I need
      // Make sure not to read past bytesRead because the old data was not cleared to increase performance a bit.
    }
    Now, you're only allocating the buffer once and just reusing it.  No need to worry about excessive garbage collections.

    HTH,
    ShaneB

    • Marked as answer by SamAgain Thursday, August 26, 2010 6:51 AM
    Monday, August 23, 2010 5:59 AM

All replies

  • Well, on 32-bit windows you get 2GB allocated for a user process by default.  Most of that is of course virtual ram - basically, swap space on disk.  Reading a 2GB file in to memory, is probably NOT a good idea as you will basically be filling up your processes memory space, and will likely get out of memory conditions.

    Why do you need to read it all into memory?  Can you explain what it  is your doing with this file, I'm sure a more appropriate solution can be had.


    Tom Shelton
    • Marked as answer by SamAgain Thursday, August 26, 2010 6:51 AM
    Sunday, August 22, 2010 4:52 PM
  • Tom - I think what I want to know is does the size of a file directly relate to the amount of memory that must be allocated if one were to read the whole file into memory?

    Let's use a smaller value, such as 10MB. If i have a 10MB XML file and I want to read the entire thing into a string... does my application allocate 10MB of memory to do so?

    So what if I have a buffered read? Like this:

    while(!reader.EOF)
    {
    byte[] buffer = new byte[1024];
    int bytesRead = reader.ReadBytes(buffer, 0, 1024);
    // Determine if the buffer contains what I need
    }
    

    I realize the above code snippet is probably not syntactically correct, I'm just writing it from memory but hopefully you get the idea.

    So, with each iteration of that loop, the buffer would be available for garbage collection, yes? And at each iteration I will only have 1024 allocated?

     

    Sunday, August 22, 2010 5:09 PM
  • If you read an entire file into memory Windows will try to read it into physical memory first.  If Windows decides that it doesn't have enough free physical memory for the file it will copy the data to a page file and read it from the disk which will be orders of magnitude slower.  Most (if not all) applications which must read from large files do so by reading as little as possible from the file, getting only what they need when they need it. The key to that is the development/use of file formats which make this process easier (ie using lookup tables/offsets etc).

    Without knowing the details of exactly what you're doing, I will say that your example would be better written as:

    byte[] buffer = new byte[1024];
    while(!reader.EOF)
    {
      int bytesRead = reader.ReadBytes(buffer, 0, 1024);
      // Determine if the buffer contains what I need
      // Make sure not to read past bytesRead because the old data was not cleared to increase performance a bit.
    }
    Now, you're only allocating the buffer once and just reusing it.  No need to worry about excessive garbage collections.

    HTH,
    ShaneB

    • Marked as answer by SamAgain Thursday, August 26, 2010 6:51 AM
    Monday, August 23, 2010 5:59 AM